Files
momentry_core/AGENTS.md
Warren f4697396e4 chore: update dependencies and AGENTS.md
- Add mac_address crate for MAC address detection
- Add tempfile dev dependency for testing
- Update AGENTS.md with latest development guidelines
2026-04-30 15:07:31 +08:00

19 KiB
Raw Blame History

AGENTS.md - Momentry Core

Rust-based digital asset management system with video analysis and RAG capabilities.


⚠️ CRITICAL: 開發隔離原則

絕對禁止事項

  • 絕對不可修改 /Users/accusys/wordpress/ 目錄下的任何檔案
  • 絕對不可修改 n8n 工作流或設定
  • 絕對不可修改 WordPress 或 n8n 的資料庫 table
  • 除非是 release 作業,絕對不可動 port 3002 (production)

開發範圍界定

範圍 狀態 說明
momentry_core_0.1/ 可開發 Momentry Core 主要開發目錄
momentry_core_0.1/portal/ 可開發 Tauri Portal 前端
momentry_core_0.1/src/ 可開發 Rust 後端程式碼
/Users/accusys/wordpress/ 禁止修改 WordPress/Marcom 團隊負責
n8n 工作流 禁止修改 自動化流程,與 dev 無關
WordPress/n8n 資料庫 table 禁止修改 Marcom 團隊管理,與 dev 無關

開發環境

服務 Port 用途 命令
Playground 3003 唯一開發環境 cargo run --bin momentry_playground -- server
Production 3002 禁止修改 cargo run -- server (僅 release 時)
Portal (Tauri) 1420 前端開發 npm run tauri dev

違反後果

  • 修改 WordPress/n8n 可能影響 marcom 團隊工作與生產環境
  • 修改 WordPress/n8n 資料庫 table 可能破壞自動化流程與資料完整性
  • 修改 port 3002 可能中斷正在使用的服務
  • 所有 dev 測試必須在 playground (3003) 進行

AI Coding Principles (Karpathy-Inspired)

Behavioral guidelines to reduce common LLM coding mistakes. Source: andrej-karpathy-skills (94K stars)

Tradeoff: These guidelines bias toward caution over speed. For trivial tasks, use judgment.

1. Think Before Coding

Don't assume. Don't hide confusion. Surface tradeoffs.

  • State your assumptions explicitly. If uncertain, ask.
  • If multiple interpretations exist, present them - don't pick silently.
  • If a simpler approach exists, say so. Push back when warranted.
  • If something is unclear, stop. Name what's confusing. Ask.

2. Simplicity First

Minimum code that solves the problem. Nothing speculative.

  • No features beyond what was asked.
  • No abstractions for single-use code.
  • No "flexibility" or "configurability" that wasn't requested.
  • No error handling for impossible scenarios.
  • If you write 200 lines and it could be 50, rewrite it.

Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.

3. Surgical Changes

Touch only what you must. Clean up only your own mess.

When editing existing code:

  • Don't "improve" adjacent code, comments, or formatting.
  • Don't refactor things that aren't broken.
  • Match existing style, even if you'd do it differently.
  • If you notice unrelated dead code, mention it - don't delete it.

When your changes create orphans:

  • Remove imports/variables/functions that YOUR changes made unused.
  • Don't remove pre-existing dead code unless asked.

The test: Every changed line should trace directly to the user's request.

4. Goal-Driven Execution

Define success criteria. Loop until verified.

Transform tasks into verifiable goals:

  • "Add validation" -> "Write tests for invalid inputs, then make them pass"
  • "Fix the bug" -> "Write a test that reproduces it, then make it pass"
  • "Refactor X" -> "Ensure tests pass before and after"

For multi-step tasks, state a brief plan:

1. [Step] -> verify: [check]
2. [Step] -> verify: [check]
3. [Step] -> verify: [check]

Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.


These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.


Terminology (V4.0)

Term Scope Description Example
file_uuid Video file Video file identifier (renamed from video_uuid) 384b0ff44aaaa1f1
identity_uuid Global identity Global person identity (cross-file) a9a90105-6d6b-46ff-92da-0c3c1a57dff4
face_id Single detection Single face detection (frame-level) face_100
trace_id Face tracking Face tracking ID (Face Tracker output) 2
chunk_id Sentence chunk Sentence chunk (from pre_chunks via rules) chunk_1
speaker_id Speaker segment Speaker ID (from ASRX) SPEAKER_0
person_id Deprecated Video-local person ID (removed in V4.0) -

Architecture (V4.0)

Face → Identity (Two-layer, direct binding)
  ↓
  person_identities table: REMOVED
  file_identities table: ADDED (N:N relationship)

Key Changes (V3.x → V4.0)

Change V3.x V4.0
video_uuid Used everywhere file_uuid
person_identities Required (303 records) Removed
person_id APIs 28 endpoints Removed (except register/bind)
Face binding Person → Identity Face → Identity (direct)
Chunk binding Manual Auto (time alignment)

Build & Run Commands

# Build project (use debug builds for development/testing)
cargo build
cargo build --bin momentry
cargo build --bin momentry_playground

# Build all binaries
cargo build --bins

# Run CLI
cargo run -- --help
cargo run -- register /path/to/video.mp4
cargo run -- server --host 0.0.0.0 --port 3002

# Run playground (development binary)
cargo run --bin momentry_playground -- server
cargo run --bin momentry_playground -- --help

⚠️ CRITICAL: cargo build --release PROHIBITION

  • NEVER run cargo build --release unless the user explicitly says "release the binary" or "正式 release"
  • cargo build --release is SLOW and only needed when producing a production binary for deployment
  • For all development, testing, debugging, and linting: use cargo build or cargo check
  • If uncertain, ALWAYS ask the user first

Binaries

Binary Purpose Port Redis Prefix Environment
momentry Production 3002 momentry: .env
momentry_playground Development 3003 momentry_dev: .env.development
momentry_player Video player - - -

Testing

# Run all tests
cargo test

# Run single test by name
cargo test test_name

# Run with output
cargo test -- --nocapture

# Doc tests
cargo test --doc

Linting & Formatting

# Format code (edition=2021, max_width=100, tab_spaces=4)
cargo fmt
cargo fmt -- --check

# Lint
cargo clippy
cargo clippy --all-features

# Check for errors
cargo check
cargo check --all-features

Code Style

General

  • Use Rust 2021 edition
  • Use tracing for logging (not println!)
  • Keep lines under 100 characters

Imports (order: std → external → local)

use std::path::Path;
use anyhow::{Context, Result};
use async_trait::async_trait;
use serde::{Deserialize, Serialize};

use crate::core::chunk::Chunk;

Error Handling

  • Use anyhow::Result<T> for application code
  • Use thiserror for library code
  • Use .context() for error context
  • Use anyhow::bail!() for early returns
fn example() -> Result<SomeType> {
    let output = Command::new("ffprobe")
        .args([...])
        .output()
        .context("Failed to run ffprobe")?;

    if !output.status.success() {
        anyhow::bail!("Command failed");
    }
    Ok(result)
}

Naming

  • Types/Enums: PascalCase (VideoRecord, ChunkType)
  • Functions/Variables: snake_case (get_video_by_uuid)
  • Traits: PascalCase with -er suffix (Database, ChunkStore)
  • Files: snake_case (postgres_db.rs)

Types

  • Use serde::{Deserialize, Serialize} for serializable types
  • Use #[serde(rename_all = "snake_case")] for enum variants
  • Use explicit numeric types (i64, u32, f64)
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct VideoRecord {
    pub id: i64,
    pub uuid: String,
    pub duration: f64,
    pub width: u32,
}

#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "snake_case")]
pub enum ChunkType {
    TimeBased,
    Sentence,
    Cut,
}

Async Programming

  • Use tokio runtime with full features
  • Use #[async_trait] for async trait methods
#[async_trait]
pub trait Database: Send + Sync {
    async fn init() -> Result<Self>
    where Self: Sized;
}

Code Structure

src/
├── main.rs           # CLI entry point
├── lib.rs            # Library exports
├── core/
│   ├── api_key/     # API key management (anomaly, blacklist, encryption, etc.)
│   ├── chunk/        # Chunking logic
│   ├── config.rs     # Centralized configuration (env vars)
│   ├── db/          # Database (PostgreSQL, MongoDB, Redis, Qdrant)
│   ├── embedding/   # Vector embeddings
│   ├── overlay/     # Video overlay
│   ├── probe/       # ffprobe integration
│   ├── processor/   # ASR, OCR, YOLO, Face, Pose, CUT, ASRX
│   │   └── executor.rs  # Unified Python script executor
│   ├── storage/     # File management
│   └── thumbnail/   # Thumbnail extraction
├── api/              # HTTP API (axum)
├── player/           # Video player
├── ui/               # TUI components
└── watcher/          # File system watcher

Key Dependencies

  • Error handling: anyhow, thiserror
  • Async: tokio (full features), async-trait
  • CLI: clap (derive)
  • Serialization: serde, serde_json, chrono
  • Database: sqlx, mongodb, redis (1.0), qdrant-client
  • HTTP: axum, tower
  • Logging: tracing, tracing-subscriber
  • Config: once_cell (lazy static config)

Environment Variables

Server

  • MOMENTRY_SERVER_PORT - API server port (default: 3002 for production, 3003 for playground)
  • MOMENTRY_REDIS_PREFIX - Redis key prefix (default: momentry: for production, momentry_dev: for playground)
  • MOMENTRY_API_KEY - API key for Player online mode testing

Testing API Key

export MOMENTRY_API_KEY="muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69"

# Test Player online mode
cargo run --features player --bin momentry_player -- -o

Database

  • DATABASE_URL - PostgreSQL (default: postgres://accusys@localhost:5432/momentry)

Redis

  • REDIS_URL - Redis URL (default: redis://:accusys@localhost:6379)
  • REDIS_PASSWORD - Redis password (default: accusys)

Paths

  • MOMENTRY_OUTPUT_DIR - Output directory (default: /Users/accusys/momentry/output)
  • MOMENTRY_BACKUP_DIR - Backup directory
  • MOMENTRY_PYTHON_PATH - Python path (default: /opt/homebrew/bin/python3.11)
  • MOMENTRY_SCRIPTS_DIR - Scripts directory

Processor Timeouts

  • MOMENTRY_ASR_TIMEOUT - ASR timeout in seconds (default: 3600)
  • MOMENTRY_CUT_TIMEOUT - CUT timeout in seconds (default: 3600)
  • MOMENTRY_DEFAULT_TIMEOUT - Default timeout (default: 7200)

Synonym Expansion

  • MOMENTRY_SYNONYM_FILES - Comma-separated paths to synonym JSON files (e.g., data/english_synonyms.json,data/llm_synonyms.json)
  • MOMENTRY_SYNONYM_FILE - Single synonym JSON file path (deprecated, use above)

Logging

  • RUST_LOG or MOMENTRY_LOG_LEVEL - Log level (default: info)

Notes

  • Unit tests exist (86 library tests)
  • Video processing uses external tools (ffprobe, Python scripts)
  • Multi-database architecture (PostgreSQL, MongoDB, Redis, Qdrant)
  • Monitor directory is a separate system (not Rust)
  • PythonExecutor provides unified script execution with timeout support
  • Redis 1.0.x for improved performance

LLM Synonym Generation

Generate synonym database using llama.cpp (Gemma4):

# Generate full database (162 entries, ~5 minutes)
python3 scripts/generate_synonyms_llamacpp.py

# Quick test
python3 scripts/generate_synonyms_llamacpp.py --test

# Resume from existing file
python3 scripts/generate_synonyms_llamacpp.py --resume

# Output: data/llm_synonyms.json (27 Chinese + 135 English words)

Task Management

使用 todowrite 追蹤任務

# 創建任務清單
/todo 建立配置模組 [in_progress]
/todo 添加單元測試 [pending]

# 更新狀態
/todo 完成標記 [completed]

任務批次建議

  • 一次處理 1-2 個功能
  • 每個功能完成後驗證 (clippy + test)
  • 驗證通過後再繼續下一個

Code Review Checklist

完成任務後檢查:

  • cargo clippy --lib 通過
  • cargo test --lib 通過
  • cargo fmt -- --check 通過
  • 文檔已更新 (如需要)
  • 新功能有單元測試

Commit Guidelines

# feat: 新功能
git commit -m "feat: add monitor_jobs table"

# fix: 錯誤修復
git commit -m "fix: resolve SQL injection in store_vector"

# refactor: 重構
git commit -m "refactor: use parameterized queries"

# docs: 文檔更新
git commit -m "docs: update AGENTS.md with new modules"

Pre-commit Hook

專案已配置 .git/hooks/pre-commit,提交前自動檢查:

# 檢查內容
1. cargo fmt --check    # Rust 格式化檢查
2. cargo clippy --lib   # Rust Lint 檢查
3. cargo test --lib     # Rust 單元測試
4. ruff check           # Python Lint 檢查
5. ruff format --check  # Python 格式化檢查
6. markdownlint         # Markdown 格式檢查
7. shellcheck           # Shell 腳本檢查

# 跳過檢查(不建議)
git commit --no-verify

# 跳過特定檢查
git commit --skip-checks

注意: Hook 僅檢查已暫存的 Rust/Python/Markdown 文件。

Python 環境設置

# 安裝 ruff
pip install ruff==0.11.2

# 格式化 Python 文件
ruff format scripts/

# Lint Python 文件
ruff check scripts/

Markdown 環境設置

# 安裝 markdownlint-cli (使用系統 Node.js)
npm install -g markdownlint-cli

# 檢查 Markdown 文件
markdownlint docs/

# 配置檔案
.markdownlint.json

Shell 環境設置

# 安裝 shellcheck
brew install shellcheck

# 檢查 Shell 腳本
shellcheck scripts/*.sh monitor/**/*.sh

注意: Hook 只檢查 error 等級的 shellcheck 問題style 警告會顯示但不阻擋提交。

Release Workflow

Release 前準備

每次 release production binary 前,必須:

  1. 建立 Release Tag

    git tag -a v0.X.X -m "Release vX.X.X - YYYY-MM-DD"
    git push origin v0.X.X
    
  2. 備份獨立 Source Code

    # 建立 release 獨立目錄
    RELEASE_DIR="/Users/accusys/momentry_core_releases/v0.X.X"
    mkdir -p "$RELEASE_DIR"
    
    # 複製完整原始碼(排除不必要的檔案)
    rsync -av --exclude='.git' --exclude='target' --exclude='node_modules' \
          /Users/accusys/momentry_core_0.1/ "$RELEASE_DIR/"
    
    # 記錄 release 資訊
    echo "Release: v0.X.X" > "$RELEASE_DIR/RELEASE_INFO.txt"
    echo "Date: $(date)" >> "$RELEASE_DIR/RELEASE_INFO.txt"
    echo "Git Commit: $(git rev-parse HEAD)" >> "$RELEASE_DIR/RELEASE_INFO.txt"
    echo "Binary: $(ls -la target/release/momentry)" >> "$RELEASE_DIR/RELEASE_INFO.txt"
    
  3. 備份 Binary

    cp target/release/momentry "$RELEASE_DIR/momentry_v0.X.X"
    cp target/release/momentry_playground "$RELEASE_DIR/momentry_playground_v0.X.X" 2>/dev/null
    
  4. 記錄資料庫 Schema

    pg_dump -U accusys -d momentry --schema-only > "$RELEASE_DIR/schema_v0.X.X.sql"
    

重要性

  • 避免 release binary 與 current source code 不一致
  • 方便追蹤特定 release 的程式碼狀態
  • 必要時可快速復原或比對差異
  • 確保資料庫 schema 與程式碼版本對應

Reference Documents

文件 用途
docs/OPENCODE_GUIDE.md OpenCode 使用規範
docs/ARCHITECTURE_EVALUATION.md 架構優化待評估項目 (含 GraphRAG)
docs/PENDING_ISSUES.md 待解決問題追蹤
docs/MOMENTRY_CORE_MONITORING.md 監控系統規範
docs/MOMENTRY_CORE_REDIS_KEYS.md Redis Key 設計規範
docs/PYTHON.md Python 腳本規範
docs/FILE_CHANGE_MANAGEMENT.md 文件修改管理規範
docs/YOLO_RESUME_INTEGRATION.md YOLO Resume 功能整合記錄
docs/DOCUMENT_EMBEDDING_STRATEGY.md Parent-Child 嵌入策略
docs/PROCESSING_PIPELINE.md 處理流程文檔
docs/N8N_DEMO_WORKFLOW.md n8n 工作流文檔
docs/FRESH_MAC_INSTALLATION.md 全新 Mac 安裝指南
docs/SERVICES.md 服務總覽與管理
docs/SFTPGO_DEMO_USER.md SFTPGo 用戶指南

Document Change Workflow

修改文件前請參考 docs/FILE_CHANGE_MANAGEMENT.md,確保:

  1. 修改前:完整閱讀文件、執行預檢清單
  2. 修改中:提供變更計畫、取得確認
  3. 修改後:展示 diff、更新版本歷史
  4. 驗證:執行 lint/test、提交前審查

AI 工具修改規範

AI 工具修改文件時:

  • 必須先完整閱讀文件(不可只讀取部分章節)
  • 修改前先提出變更計畫供確認
  • 修改後展示 diff 內容
  • 更新版本歷史表

PHP Development

WordPress 作為 Momentry Portal負責 n8n 自動化與 sftpgo 檔案服務的頁面整合。

編輯器設定

編輯器 LSP 方案 安裝方式
VS Code Intelephense Extension Marketplace (推薦)
Cursor Intelephense Extension Marketplace (推薦)
CLI phpactor ~/bin/phpactor

Intelephense (VS Code/Cursor)

  1. 安裝 Extension: 搜尋 "Intelephense"
  2. 設定:
{
  "intelephense.stubs": ["wordpress"]
}

phpactor (CLI)

# 安裝方式
brew install composer
curl -sSL https://github.com/phpactor/phpactor/releases/latest/download/phpactor.phar -o ~/bin/phpactor
chmod +x ~/bin/phpactor

# 安裝 WordPress Stubs
cd /Users/accusys/wordpress/web
composer require --dev php-stubs/wordpress-stubs

# 建立 WordPress 索引
cd /Users/accusys/wordpress/web
~/bin/phpactor index:build --reset

# 常用指令
~/bin/phpactor class:search "WP_User"      # 搜尋類別
~/bin/phpactor index:query WP_User          # 查看類別資訊
~/bin/phpactor navigate /path/to/file.php  # 導航到定義

WordPress 程式碼位置

類型 路徑
主題 /Users/accusys/wordpress/web/wp-content/themes/
插件 /Users/accusys/wordpress/web/wp-content/plugins/

與 marcom 團隊協作

角色 負責
marcom 團隊 Figma 設計 / Elementor 建構
OpenCode 程式碼實作 / 重構

開發時程

Phase 1: marcom 建構 (現在)    → Elementor 頁面建構
Phase 2: 交付審視 (TBD)      → 功能確認 / 重構評估
Phase 3: OpenCode 重構        → 純程式碼實作,交付無 Elementor 依賴版本