Files

Warren f4697396e4 chore: update dependencies and AGENTS.md

- Add mac_address crate for MAC address detection
- Add tempfile dev dependency for testing
- Update AGENTS.md with latest development guidelines

2026-04-30 15:07:31 +08:00

19 KiB

Raw Blame History

AGENTS.md - Momentry Core

Rust-based digital asset management system with video analysis and RAG capabilities.

⚠️ CRITICAL: 開發隔離原則

絕對禁止事項

絕對不可修改 /Users/accusys/wordpress/ 目錄下的任何檔案
絕對不可修改 n8n 工作流或設定
絕對不可修改 WordPress 或 n8n 的資料庫 table
除非是 release 作業，絕對不可動 port 3002 (production)

開發範圍界定

範圍	狀態	說明
`momentry_core_0.1/`	✅ 可開發	Momentry Core 主要開發目錄
`momentry_core_0.1/portal/`	✅ 可開發	Tauri Portal 前端
`momentry_core_0.1/src/`	✅ 可開發	Rust 後端程式碼
`/Users/accusys/wordpress/`	❌ 禁止修改	WordPress/Marcom 團隊負責
n8n 工作流	❌ 禁止修改	自動化流程，與 dev 無關
WordPress/n8n 資料庫 table	❌ 禁止修改	Marcom 團隊管理，與 dev 無關

開發環境

服務	Port	用途	命令
Playground	3003	唯一開發環境	`cargo run --bin momentry_playground -- server`
Production	3002	❌ 禁止修改	`cargo run -- server` (僅 release 時)
Portal (Tauri)	1420	前端開發	`npm run tauri dev`

違反後果

修改 WordPress/n8n 可能影響 marcom 團隊工作與生產環境
修改 WordPress/n8n 資料庫 table 可能破壞自動化流程與資料完整性
修改 port 3002 可能中斷正在使用的服務
所有 dev 測試必須在 playground (3003) 進行

AI Coding Principles (Karpathy-Inspired)

Behavioral guidelines to reduce common LLM coding mistakes. Source: andrej-karpathy-skills (94K stars)

Tradeoff: These guidelines bias toward caution over speed. For trivial tasks, use judgment.

1. Think Before Coding

Don't assume. Don't hide confusion. Surface tradeoffs.

State your assumptions explicitly. If uncertain, ask.
If multiple interpretations exist, present them - don't pick silently.
If a simpler approach exists, say so. Push back when warranted.
If something is unclear, stop. Name what's confusing. Ask.

2. Simplicity First

Minimum code that solves the problem. Nothing speculative.

No features beyond what was asked.
No abstractions for single-use code.
No "flexibility" or "configurability" that wasn't requested.
No error handling for impossible scenarios.
If you write 200 lines and it could be 50, rewrite it.

Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.

3. Surgical Changes

Touch only what you must. Clean up only your own mess.

When editing existing code:

Don't "improve" adjacent code, comments, or formatting.
Don't refactor things that aren't broken.
Match existing style, even if you'd do it differently.
If you notice unrelated dead code, mention it - don't delete it.

When your changes create orphans:

Remove imports/variables/functions that YOUR changes made unused.
Don't remove pre-existing dead code unless asked.

The test: Every changed line should trace directly to the user's request.

4. Goal-Driven Execution

Define success criteria. Loop until verified.

Transform tasks into verifiable goals:

"Add validation" -> "Write tests for invalid inputs, then make them pass"
"Fix the bug" -> "Write a test that reproduces it, then make it pass"
"Refactor X" -> "Ensure tests pass before and after"

For multi-step tasks, state a brief plan:

1. [Step] -> verify: [check]
2. [Step] -> verify: [check]
3. [Step] -> verify: [check]

Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.

These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.

Terminology (V4.0)

Term	Scope	Description	Example
file_uuid	Video file	Video file identifier (renamed from `video_uuid`)	`384b0ff44aaaa1f1`
identity_uuid	Global identity	Global person identity (cross-file)	`a9a90105-6d6b-46ff-92da-0c3c1a57dff4`
face_id	Single detection	Single face detection (frame-level)	`face_100`
trace_id	Face tracking	Face tracking ID (Face Tracker output)	`2`
chunk_id	Sentence chunk	Sentence chunk (from pre_chunks via rules)	`chunk_1`
speaker_id	Speaker segment	Speaker ID (from ASRX)	`SPEAKER_0`
person_id	❌ Deprecated	Video-local person ID (removed in V4.0)	-

Architecture (V4.0)

Face → Identity (Two-layer, direct binding)
  ↓
  person_identities table: REMOVED
  file_identities table: ADDED (N:N relationship)

Key Changes (V3.x → V4.0)

Change	V3.x	V4.0
video_uuid	Used everywhere	file_uuid
person_identities	Required (303 records)	Removed
person_id APIs	28 endpoints	Removed (except register/bind)
Face binding	Person → Identity	Face → Identity (direct)
Chunk binding	Manual	Auto (time alignment)

Build & Run Commands

# Build project (use debug builds for development/testing)
cargo build
cargo build --bin momentry
cargo build --bin momentry_playground

# Build all binaries
cargo build --bins

# Run CLI
cargo run -- --help
cargo run -- register /path/to/video.mp4
cargo run -- server --host 0.0.0.0 --port 3002

# Run playground (development binary)
cargo run --bin momentry_playground -- server
cargo run --bin momentry_playground -- --help

⚠️ CRITICAL: `cargo build --release` PROHIBITION

NEVER run cargo build --release unless the user explicitly says "release the binary" or "正式 release"
cargo build --release is SLOW and only needed when producing a production binary for deployment
For all development, testing, debugging, and linting: use cargo build or cargo check
If uncertain, ALWAYS ask the user first

Binaries

Binary	Purpose	Port	Redis Prefix	Environment
`momentry`	Production	3002	`momentry:`	`.env`
`momentry_playground`	Development	3003	`momentry_dev:`	`.env.development`
`momentry_player`	Video player	-	-	-

Testing

# Run all tests
cargo test

# Run single test by name
cargo test test_name

# Run with output
cargo test -- --nocapture

# Doc tests
cargo test --doc

Linting & Formatting

# Format code (edition=2021, max_width=100, tab_spaces=4)
cargo fmt
cargo fmt -- --check

# Lint
cargo clippy
cargo clippy --all-features

# Check for errors
cargo check
cargo check --all-features

Code Style

General

Use Rust 2021 edition
Use tracing for logging (not println!)
Keep lines under 100 characters

Imports (order: std → external → local)

use std::path::Path;
use anyhow::{Context, Result};
use async_trait::async_trait;
use serde::{Deserialize, Serialize};

use crate::core::chunk::Chunk;

Error Handling

Use anyhow::Result<T> for application code
Use thiserror for library code
Use .context() for error context
Use anyhow::bail!() for early returns

fn example() -> Result<SomeType> {
    let output = Command::new("ffprobe")
        .args([...])
        .output()
        .context("Failed to run ffprobe")?;

    if !output.status.success() {
        anyhow::bail!("Command failed");
    }
    Ok(result)
}

Naming

Types/Enums: PascalCase (VideoRecord, ChunkType)
Functions/Variables: snake_case (get_video_by_uuid)
Traits: PascalCase with -er suffix (Database, ChunkStore)
Files: snake_case (postgres_db.rs)

Types

Use serde::{Deserialize, Serialize} for serializable types
Use #[serde(rename_all = "snake_case")] for enum variants
Use explicit numeric types (i64, u32, f64)

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct VideoRecord {
    pub id: i64,
    pub uuid: String,
    pub duration: f64,
    pub width: u32,
}

#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "snake_case")]
pub enum ChunkType {
    TimeBased,
    Sentence,
    Cut,
}

Async Programming

Use tokio runtime with full features
Use #[async_trait] for async trait methods

#[async_trait]
pub trait Database: Send + Sync {
    async fn init() -> Result<Self>
    where Self: Sized;
}

Code Structure

src/
├── main.rs           # CLI entry point
├── lib.rs            # Library exports
├── core/
│   ├── api_key/     # API key management (anomaly, blacklist, encryption, etc.)
│   ├── chunk/        # Chunking logic
│   ├── config.rs     # Centralized configuration (env vars)
│   ├── db/          # Database (PostgreSQL, MongoDB, Redis, Qdrant)
│   ├── embedding/   # Vector embeddings
│   ├── overlay/     # Video overlay
│   ├── probe/       # ffprobe integration
│   ├── processor/   # ASR, OCR, YOLO, Face, Pose, CUT, ASRX
│   │   └── executor.rs  # Unified Python script executor
│   ├── storage/     # File management
│   └── thumbnail/   # Thumbnail extraction
├── api/              # HTTP API (axum)
├── player/           # Video player
├── ui/               # TUI components
└── watcher/          # File system watcher

Key Dependencies

Error handling: anyhow, thiserror
Async: tokio (full features), async-trait
CLI: clap (derive)
Serialization: serde, serde_json, chrono
Database: sqlx, mongodb, redis (1.0), qdrant-client
HTTP: axum, tower
Logging: tracing, tracing-subscriber
Config: once_cell (lazy static config)

Environment Variables

Server

MOMENTRY_SERVER_PORT - API server port (default: 3002 for production, 3003 for playground)
MOMENTRY_REDIS_PREFIX - Redis key prefix (default: momentry: for production, momentry_dev: for playground)
MOMENTRY_API_KEY - API key for Player online mode testing

Testing API Key

export MOMENTRY_API_KEY="muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69"

# Test Player online mode
cargo run --features player --bin momentry_player -- -o

Database

DATABASE_URL - PostgreSQL (default: postgres://accusys@localhost:5432/momentry)

Redis

REDIS_URL - Redis URL (default: redis://:accusys@localhost:6379)
REDIS_PASSWORD - Redis password (default: accusys)

Paths

MOMENTRY_OUTPUT_DIR - Output directory (default: /Users/accusys/momentry/output)
MOMENTRY_BACKUP_DIR - Backup directory
MOMENTRY_PYTHON_PATH - Python path (default: /opt/homebrew/bin/python3.11)
MOMENTRY_SCRIPTS_DIR - Scripts directory

Processor Timeouts

MOMENTRY_ASR_TIMEOUT - ASR timeout in seconds (default: 3600)
MOMENTRY_CUT_TIMEOUT - CUT timeout in seconds (default: 3600)
MOMENTRY_DEFAULT_TIMEOUT - Default timeout (default: 7200)

Synonym Expansion

MOMENTRY_SYNONYM_FILES - Comma-separated paths to synonym JSON files (e.g., data/english_synonyms.json,data/llm_synonyms.json)
MOMENTRY_SYNONYM_FILE - Single synonym JSON file path (deprecated, use above)

Logging

RUST_LOG or MOMENTRY_LOG_LEVEL - Log level (default: info)

Notes

Unit tests exist (86 library tests)
Video processing uses external tools (ffprobe, Python scripts)
Multi-database architecture (PostgreSQL, MongoDB, Redis, Qdrant)
Monitor directory is a separate system (not Rust)
PythonExecutor provides unified script execution with timeout support
Redis 1.0.x for improved performance

LLM Synonym Generation

Generate synonym database using llama.cpp (Gemma4):

# Generate full database (162 entries, ~5 minutes)
python3 scripts/generate_synonyms_llamacpp.py

# Quick test
python3 scripts/generate_synonyms_llamacpp.py --test

# Resume from existing file
python3 scripts/generate_synonyms_llamacpp.py --resume

# Output: data/llm_synonyms.json (27 Chinese + 135 English words)

Task Management

使用 todowrite 追蹤任務

# 創建任務清單
/todo 建立配置模組 [in_progress]
/todo 添加單元測試 [pending]

# 更新狀態
/todo 完成標記 [completed]

任務批次建議

一次處理 1-2 個功能
每個功能完成後驗證 (clippy + test)
驗證通過後再繼續下一個

Code Review Checklist

完成任務後檢查：

cargo clippy --lib 通過
cargo test --lib 通過
cargo fmt -- --check 通過
文檔已更新 (如需要)
新功能有單元測試

Commit Guidelines

# feat: 新功能
git commit -m "feat: add monitor_jobs table"

# fix: 錯誤修復
git commit -m "fix: resolve SQL injection in store_vector"

# refactor: 重構
git commit -m "refactor: use parameterized queries"

# docs: 文檔更新
git commit -m "docs: update AGENTS.md with new modules"

Pre-commit Hook

專案已配置 .git/hooks/pre-commit，提交前自動檢查：

# 檢查內容
1. cargo fmt --check    # Rust 格式化檢查
2. cargo clippy --lib   # Rust Lint 檢查
3. cargo test --lib     # Rust 單元測試
4. ruff check           # Python Lint 檢查
5. ruff format --check  # Python 格式化檢查
6. markdownlint         # Markdown 格式檢查
7. shellcheck           # Shell 腳本檢查

# 跳過檢查（不建議）
git commit --no-verify

# 跳過特定檢查
git commit --skip-checks

注意: Hook 僅檢查已暫存的 Rust/Python/Markdown 文件。

Python 環境設置

# 安裝 ruff
pip install ruff==0.11.2

# 格式化 Python 文件
ruff format scripts/

# Lint Python 文件
ruff check scripts/

Markdown 環境設置

# 安裝 markdownlint-cli (使用系統 Node.js)
npm install -g markdownlint-cli

# 檢查 Markdown 文件
markdownlint docs/

# 配置檔案
.markdownlint.json

Shell 環境設置

# 安裝 shellcheck
brew install shellcheck

# 檢查 Shell 腳本
shellcheck scripts/*.sh monitor/**/*.sh

注意: Hook 只檢查 error 等級的 shellcheck 問題，style 警告會顯示但不阻擋提交。

Release Workflow

Release 前準備

每次 release production binary 前，必須：

建立 Release Tag

git tag -a v0.X.X -m "Release vX.X.X - YYYY-MM-DD"
git push origin v0.X.X

備份獨立 Source Code

# 建立 release 獨立目錄
RELEASE_DIR="/Users/accusys/momentry_core_releases/v0.X.X"
mkdir -p "$RELEASE_DIR"

# 複製完整原始碼（排除不必要的檔案）
rsync -av --exclude='.git' --exclude='target' --exclude='node_modules' \
      /Users/accusys/momentry_core_0.1/ "$RELEASE_DIR/"

# 記錄 release 資訊
echo "Release: v0.X.X" > "$RELEASE_DIR/RELEASE_INFO.txt"
echo "Date: $(date)" >> "$RELEASE_DIR/RELEASE_INFO.txt"
echo "Git Commit: $(git rev-parse HEAD)" >> "$RELEASE_DIR/RELEASE_INFO.txt"
echo "Binary: $(ls -la target/release/momentry)" >> "$RELEASE_DIR/RELEASE_INFO.txt"

備份 Binary

cp target/release/momentry "$RELEASE_DIR/momentry_v0.X.X"
cp target/release/momentry_playground "$RELEASE_DIR/momentry_playground_v0.X.X" 2>/dev/null

記錄資料庫 Schema

pg_dump -U accusys -d momentry --schema-only > "$RELEASE_DIR/schema_v0.X.X.sql"

重要性

避免 release binary 與 current source code 不一致
方便追蹤特定 release 的程式碼狀態
必要時可快速復原或比對差異
確保資料庫 schema 與程式碼版本對應

Reference Documents

文件	用途
`docs/OPENCODE_GUIDE.md`	OpenCode 使用規範
`docs/ARCHITECTURE_EVALUATION.md`	架構優化待評估項目 (含 GraphRAG)
`docs/PENDING_ISSUES.md`	待解決問題追蹤
`docs/MOMENTRY_CORE_MONITORING.md`	監控系統規範
`docs/MOMENTRY_CORE_REDIS_KEYS.md`	Redis Key 設計規範
`docs/PYTHON.md`	Python 腳本規範
`docs/FILE_CHANGE_MANAGEMENT.md`	文件修改管理規範
`docs/YOLO_RESUME_INTEGRATION.md`	YOLO Resume 功能整合記錄
`docs/DOCUMENT_EMBEDDING_STRATEGY.md`	Parent-Child 嵌入策略
`docs/PROCESSING_PIPELINE.md`	處理流程文檔
`docs/N8N_DEMO_WORKFLOW.md`	n8n 工作流文檔
`docs/FRESH_MAC_INSTALLATION.md`	全新 Mac 安裝指南
`docs/SERVICES.md`	服務總覽與管理
`docs/SFTPGO_DEMO_USER.md`	SFTPGo 用戶指南

Document Change Workflow

修改文件前請參考 docs/FILE_CHANGE_MANAGEMENT.md，確保：

修改前：完整閱讀文件、執行預檢清單
修改中：提供變更計畫、取得確認
修改後：展示 diff、更新版本歷史
驗證：執行 lint/test、提交前審查

AI 工具修改規範

AI 工具修改文件時：

必須先完整閱讀文件（不可只讀取部分章節）
修改前先提出變更計畫供確認
修改後展示 diff 內容
更新版本歷史表

PHP Development

WordPress 作為 Momentry Portal，負責 n8n 自動化與 sftpgo 檔案服務的頁面整合。

編輯器設定

編輯器	LSP 方案	安裝方式
VS Code	Intelephense	Extension Marketplace (推薦)
Cursor	Intelephense	Extension Marketplace (推薦)
CLI	phpactor	`~/bin/phpactor`

Intelephense (VS Code/Cursor)

安裝 Extension: 搜尋 "Intelephense"
設定:

{
  "intelephense.stubs": ["wordpress"]
}

phpactor (CLI)

# 安裝方式
brew install composer
curl -sSL https://github.com/phpactor/phpactor/releases/latest/download/phpactor.phar -o ~/bin/phpactor
chmod +x ~/bin/phpactor

# 安裝 WordPress Stubs
cd /Users/accusys/wordpress/web
composer require --dev php-stubs/wordpress-stubs

# 建立 WordPress 索引
cd /Users/accusys/wordpress/web
~/bin/phpactor index:build --reset

# 常用指令
~/bin/phpactor class:search "WP_User"      # 搜尋類別
~/bin/phpactor index:query WP_User          # 查看類別資訊
~/bin/phpactor navigate /path/to/file.php  # 導航到定義

WordPress 程式碼位置

類型	路徑
主題	`/Users/accusys/wordpress/web/wp-content/themes/`
插件	`/Users/accusys/wordpress/web/wp-content/plugins/`

與 marcom 團隊協作

角色	負責
marcom 團隊	Figma 設計 / Elementor 建構
OpenCode	程式碼實作 / 重構

開發時程

Phase 1: marcom 建構 (現在)    → Elementor 頁面建構
Phase 2: 交付審視 (TBD)      → 功能確認 / 重構評估
Phase 3: OpenCode 重構        → 純程式碼實作，交付無 Elementor 依賴版本

19 KiB Raw Blame History Unescape Escape