Files
momentry_core/AGENTS.md
Warren f4697396e4 chore: update dependencies and AGENTS.md
- Add mac_address crate for MAC address detection
- Add tempfile dev dependency for testing
- Update AGENTS.md with latest development guidelines
2026-04-30 15:07:31 +08:00

630 lines
19 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS.md - Momentry Core
Rust-based digital asset management system with video analysis and RAG capabilities.
---
## ⚠️ CRITICAL: 開發隔離原則
### 絕對禁止事項
- **絕對不可修改 `/Users/accusys/wordpress/` 目錄下的任何檔案**
- **絕對不可修改 n8n 工作流或設定**
- **絕對不可修改 WordPress 或 n8n 的資料庫 table**
- **除非是 release 作業,絕對不可動 port 3002 (production)**
### 開發範圍界定
| 範圍 | 狀態 | 說明 |
|------|------|------|
| `momentry_core_0.1/` | ✅ **可開發** | Momentry Core 主要開發目錄 |
| `momentry_core_0.1/portal/` | ✅ **可開發** | Tauri Portal 前端 |
| `momentry_core_0.1/src/` | ✅ **可開發** | Rust 後端程式碼 |
| `/Users/accusys/wordpress/` | ❌ **禁止修改** | WordPress/Marcom 團隊負責 |
| n8n 工作流 | ❌ **禁止修改** | 自動化流程,與 dev 無關 |
| WordPress/n8n 資料庫 table | ❌ **禁止修改** | Marcom 團隊管理,與 dev 無關 |
### 開發環境
| 服務 | Port | 用途 | 命令 |
|------|------|------|------|
| Playground | 3003 | **唯一開發環境** | `cargo run --bin momentry_playground -- server` |
| Production | 3002 | ❌ 禁止修改 | `cargo run -- server` (僅 release 時) |
| Portal (Tauri) | 1420 | 前端開發 | `npm run tauri dev` |
### 違反後果
- 修改 WordPress/n8n 可能影響 marcom 團隊工作與生產環境
- 修改 WordPress/n8n 資料庫 table 可能破壞自動化流程與資料完整性
- 修改 port 3002 可能中斷正在使用的服務
- 所有 dev 測試必須在 playground (3003) 進行
---
## AI Coding Principles (Karpathy-Inspired)
Behavioral guidelines to reduce common LLM coding mistakes.
Source: [andrej-karpathy-skills](https://github.com/forrestchang/andrej-karpathy-skills) (94K stars)
**Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.
### 1. Think Before Coding
**Don't assume. Don't hide confusion. Surface tradeoffs.**
- State your assumptions explicitly. If uncertain, ask.
- If multiple interpretations exist, present them - don't pick silently.
- If a simpler approach exists, say so. Push back when warranted.
- If something is unclear, stop. Name what's confusing. Ask.
### 2. Simplicity First
**Minimum code that solves the problem. Nothing speculative.**
- No features beyond what was asked.
- No abstractions for single-use code.
- No "flexibility" or "configurability" that wasn't requested.
- No error handling for impossible scenarios.
- If you write 200 lines and it could be 50, rewrite it.
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
### 3. Surgical Changes
**Touch only what you must. Clean up only your own mess.**
When editing existing code:
- Don't "improve" adjacent code, comments, or formatting.
- Don't refactor things that aren't broken.
- Match existing style, even if you'd do it differently.
- If you notice unrelated dead code, mention it - don't delete it.
When your changes create orphans:
- Remove imports/variables/functions that YOUR changes made unused.
- Don't remove pre-existing dead code unless asked.
The test: Every changed line should trace directly to the user's request.
### 4. Goal-Driven Execution
**Define success criteria. Loop until verified.**
Transform tasks into verifiable goals:
- "Add validation" -> "Write tests for invalid inputs, then make them pass"
- "Fix the bug" -> "Write a test that reproduces it, then make it pass"
- "Refactor X" -> "Ensure tests pass before and after"
For multi-step tasks, state a brief plan:
```
1. [Step] -> verify: [check]
2. [Step] -> verify: [check]
3. [Step] -> verify: [check]
```
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
---
These guidelines are working if: fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.
---
## Terminology (V4.0)
| Term | Scope | Description | Example |
|------|-------|-------------|---------|
| **file_uuid** | Video file | Video file identifier (renamed from `video_uuid`) | `384b0ff44aaaa1f1` |
| **identity_uuid** | Global identity | Global person identity (cross-file) | `a9a90105-6d6b-46ff-92da-0c3c1a57dff4` |
| **face_id** | Single detection | Single face detection (frame-level) | `face_100` |
| **trace_id** | Face tracking | Face tracking ID (Face Tracker output) | `2` |
| **chunk_id** | Sentence chunk | Sentence chunk (from pre_chunks via rules) | `chunk_1` |
| **speaker_id** | Speaker segment | Speaker ID (from ASRX) | `SPEAKER_0` |
| **person_id** | ❌ **Deprecated** | Video-local person ID (removed in V4.0) | - |
### Architecture (V4.0)
```
Face → Identity (Two-layer, direct binding)
person_identities table: REMOVED
file_identities table: ADDED (N:N relationship)
```
### Key Changes (V3.x → V4.0)
| Change | V3.x | V4.0 |
|--------|------|------|
| **video_uuid** | Used everywhere | **file_uuid** |
| **person_identities** | Required (303 records) | **Removed** |
| **person_id APIs** | 28 endpoints | **Removed** (except register/bind) |
| **Face binding** | Person → Identity | **Face → Identity** (direct) |
| **Chunk binding** | Manual | **Auto** (time alignment) |
---
## Build & Run Commands
```bash
# Build project (use debug builds for development/testing)
cargo build
cargo build --bin momentry
cargo build --bin momentry_playground
# Build all binaries
cargo build --bins
# Run CLI
cargo run -- --help
cargo run -- register /path/to/video.mp4
cargo run -- server --host 0.0.0.0 --port 3002
# Run playground (development binary)
cargo run --bin momentry_playground -- server
cargo run --bin momentry_playground -- --help
```
### ⚠️ CRITICAL: `cargo build --release` PROHIBITION
- **NEVER run `cargo build --release` unless the user explicitly says "release the binary" or "正式 release"**
- `cargo build --release` is SLOW and only needed when producing a production binary for deployment
- For all development, testing, debugging, and linting: use `cargo build` or `cargo check`
- If uncertain, ALWAYS ask the user first
## Binaries
| Binary | Purpose | Port | Redis Prefix | Environment |
|--------|---------|------|--------------|-------------|
| `momentry` | Production | 3002 | `momentry:` | `.env` |
| `momentry_playground` | Development | 3003 | `momentry_dev:` | `.env.development` |
| `momentry_player` | Video player | - | - | - |
## Testing
```bash
# Run all tests
cargo test
# Run single test by name
cargo test test_name
# Run with output
cargo test -- --nocapture
# Doc tests
cargo test --doc
```
## Linting & Formatting
```bash
# Format code (edition=2021, max_width=100, tab_spaces=4)
cargo fmt
cargo fmt -- --check
# Lint
cargo clippy
cargo clippy --all-features
# Check for errors
cargo check
cargo check --all-features
```
## Code Style
### General
- Use Rust 2021 edition
- Use tracing for logging (not println!)
- Keep lines under 100 characters
### Imports (order: std → external → local)
```rust
use std::path::Path;
use anyhow::{Context, Result};
use async_trait::async_trait;
use serde::{Deserialize, Serialize};
use crate::core::chunk::Chunk;
```
### Error Handling
- Use `anyhow::Result<T>` for application code
- Use `thiserror` for library code
- Use `.context()` for error context
- Use `anyhow::bail!()` for early returns
```rust
fn example() -> Result<SomeType> {
let output = Command::new("ffprobe")
.args([...])
.output()
.context("Failed to run ffprobe")?;
if !output.status.success() {
anyhow::bail!("Command failed");
}
Ok(result)
}
```
### Naming
- Types/Enums: PascalCase (`VideoRecord`, `ChunkType`)
- Functions/Variables: snake_case (`get_video_by_uuid`)
- Traits: PascalCase with -er suffix (`Database`, `ChunkStore`)
- Files: snake_case (`postgres_db.rs`)
### Types
- Use `serde::{Deserialize, Serialize}` for serializable types
- Use `#[serde(rename_all = "snake_case")]` for enum variants
- Use explicit numeric types (i64, u32, f64)
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct VideoRecord {
pub id: i64,
pub uuid: String,
pub duration: f64,
pub width: u32,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq)]
#[serde(rename_all = "snake_case")]
pub enum ChunkType {
TimeBased,
Sentence,
Cut,
}
```
### Async Programming
- Use `tokio` runtime with full features
- Use `#[async_trait]` for async trait methods
```rust
#[async_trait]
pub trait Database: Send + Sync {
async fn init() -> Result<Self>
where Self: Sized;
}
```
## Code Structure
```
src/
├── main.rs # CLI entry point
├── lib.rs # Library exports
├── core/
│ ├── api_key/ # API key management (anomaly, blacklist, encryption, etc.)
│ ├── chunk/ # Chunking logic
│ ├── config.rs # Centralized configuration (env vars)
│ ├── db/ # Database (PostgreSQL, MongoDB, Redis, Qdrant)
│ ├── embedding/ # Vector embeddings
│ ├── overlay/ # Video overlay
│ ├── probe/ # ffprobe integration
│ ├── processor/ # ASR, OCR, YOLO, Face, Pose, CUT, ASRX
│ │ └── executor.rs # Unified Python script executor
│ ├── storage/ # File management
│ └── thumbnail/ # Thumbnail extraction
├── api/ # HTTP API (axum)
├── player/ # Video player
├── ui/ # TUI components
└── watcher/ # File system watcher
```
## Key Dependencies
- **Error handling**: `anyhow`, `thiserror`
- **Async**: `tokio` (full features), `async-trait`
- **CLI**: `clap` (derive)
- **Serialization**: `serde`, `serde_json`, `chrono`
- **Database**: `sqlx`, `mongodb`, `redis` (1.0), `qdrant-client`
- **HTTP**: `axum`, `tower`
- **Logging**: `tracing`, `tracing-subscriber`
- **Config**: `once_cell` (lazy static config)
## Environment Variables
### Server
- `MOMENTRY_SERVER_PORT` - API server port (default: `3002` for production, `3003` for playground)
- `MOMENTRY_REDIS_PREFIX` - Redis key prefix (default: `momentry:` for production, `momentry_dev:` for playground)
- `MOMENTRY_API_KEY` - API key for Player online mode testing
### Testing API Key
```bash
export MOMENTRY_API_KEY="muser_68600856036340bcafc01930eb4bd839_1774418104_97221b69"
# Test Player online mode
cargo run --features player --bin momentry_player -- -o
```
### Database
- `DATABASE_URL` - PostgreSQL (default: `postgres://accusys@localhost:5432/momentry`)
### Redis
- `REDIS_URL` - Redis URL (default: `redis://:accusys@localhost:6379`)
- `REDIS_PASSWORD` - Redis password (default: `accusys`)
### Paths
- `MOMENTRY_OUTPUT_DIR` - Output directory (default: `/Users/accusys/momentry/output`)
- `MOMENTRY_BACKUP_DIR` - Backup directory
- `MOMENTRY_PYTHON_PATH` - Python path (default: `/opt/homebrew/bin/python3.11`)
- `MOMENTRY_SCRIPTS_DIR` - Scripts directory
### Processor Timeouts
- `MOMENTRY_ASR_TIMEOUT` - ASR timeout in seconds (default: 3600)
- `MOMENTRY_CUT_TIMEOUT` - CUT timeout in seconds (default: 3600)
- `MOMENTRY_DEFAULT_TIMEOUT` - Default timeout (default: 7200)
### Synonym Expansion
- `MOMENTRY_SYNONYM_FILES` - Comma-separated paths to synonym JSON files (e.g., `data/english_synonyms.json,data/llm_synonyms.json`)
- `MOMENTRY_SYNONYM_FILE` - Single synonym JSON file path (deprecated, use above)
### Logging
- `RUST_LOG` or `MOMENTRY_LOG_LEVEL` - Log level (default: `info`)
## Notes
- Unit tests exist (86 library tests)
- Video processing uses external tools (ffprobe, Python scripts)
- Multi-database architecture (PostgreSQL, MongoDB, Redis, Qdrant)
- Monitor directory is a separate system (not Rust)
- PythonExecutor provides unified script execution with timeout support
- Redis 1.0.x for improved performance
### LLM Synonym Generation
Generate synonym database using llama.cpp (Gemma4):
```bash
# Generate full database (162 entries, ~5 minutes)
python3 scripts/generate_synonyms_llamacpp.py
# Quick test
python3 scripts/generate_synonyms_llamacpp.py --test
# Resume from existing file
python3 scripts/generate_synonyms_llamacpp.py --resume
# Output: data/llm_synonyms.json (27 Chinese + 135 English words)
```
## Task Management
### 使用 todowrite 追蹤任務
```bash
# 創建任務清單
/todo 建立配置模組 [in_progress]
/todo 添加單元測試 [pending]
# 更新狀態
/todo 完成標記 [completed]
```
### 任務批次建議
- 一次處理 1-2 個功能
- 每個功能完成後驗證 (clippy + test)
- 驗證通過後再繼續下一個
## Code Review Checklist
完成任務後檢查:
- [ ] `cargo clippy --lib` 通過
- [ ] `cargo test --lib` 通過
- [ ] `cargo fmt -- --check` 通過
- [ ] 文檔已更新 (如需要)
- [ ] 新功能有單元測試
## Commit Guidelines
```bash
# feat: 新功能
git commit -m "feat: add monitor_jobs table"
# fix: 錯誤修復
git commit -m "fix: resolve SQL injection in store_vector"
# refactor: 重構
git commit -m "refactor: use parameterized queries"
# docs: 文檔更新
git commit -m "docs: update AGENTS.md with new modules"
```
## Pre-commit Hook
專案已配置 `.git/hooks/pre-commit`,提交前自動檢查:
```bash
# 檢查內容
1. cargo fmt --check # Rust 格式化檢查
2. cargo clippy --lib # Rust Lint 檢查
3. cargo test --lib # Rust 單元測試
4. ruff check # Python Lint 檢查
5. ruff format --check # Python 格式化檢查
6. markdownlint # Markdown 格式檢查
7. shellcheck # Shell 腳本檢查
# 跳過檢查(不建議)
git commit --no-verify
# 跳過特定檢查
git commit --skip-checks
```
**注意**: Hook 僅檢查已暫存的 Rust/Python/Markdown 文件。
### Python 環境設置
```bash
# 安裝 ruff
pip install ruff==0.11.2
# 格式化 Python 文件
ruff format scripts/
# Lint Python 文件
ruff check scripts/
```
### Markdown 環境設置
```bash
# 安裝 markdownlint-cli (使用系統 Node.js)
npm install -g markdownlint-cli
# 檢查 Markdown 文件
markdownlint docs/
# 配置檔案
.markdownlint.json
```
### Shell 環境設置
```bash
# 安裝 shellcheck
brew install shellcheck
# 檢查 Shell 腳本
shellcheck scripts/*.sh monitor/**/*.sh
```
**注意**: Hook 只檢查 error 等級的 shellcheck 問題style 警告會顯示但不阻擋提交。
## Release Workflow
### Release 前準備
每次 release production binary 前,必須:
1. **建立 Release Tag**
```bash
git tag -a v0.X.X -m "Release vX.X.X - YYYY-MM-DD"
git push origin v0.X.X
```
2. **備份獨立 Source Code**
```bash
# 建立 release 獨立目錄
RELEASE_DIR="/Users/accusys/momentry_core_releases/v0.X.X"
mkdir -p "$RELEASE_DIR"
# 複製完整原始碼(排除不必要的檔案)
rsync -av --exclude='.git' --exclude='target' --exclude='node_modules' \
/Users/accusys/momentry_core_0.1/ "$RELEASE_DIR/"
# 記錄 release 資訊
echo "Release: v0.X.X" > "$RELEASE_DIR/RELEASE_INFO.txt"
echo "Date: $(date)" >> "$RELEASE_DIR/RELEASE_INFO.txt"
echo "Git Commit: $(git rev-parse HEAD)" >> "$RELEASE_DIR/RELEASE_INFO.txt"
echo "Binary: $(ls -la target/release/momentry)" >> "$RELEASE_DIR/RELEASE_INFO.txt"
```
3. **備份 Binary**
```bash
cp target/release/momentry "$RELEASE_DIR/momentry_v0.X.X"
cp target/release/momentry_playground "$RELEASE_DIR/momentry_playground_v0.X.X" 2>/dev/null
```
4. **記錄資料庫 Schema**
```bash
pg_dump -U accusys -d momentry --schema-only > "$RELEASE_DIR/schema_v0.X.X.sql"
```
### 重要性
- 避免 release binary 與 current source code 不一致
- 方便追蹤特定 release 的程式碼狀態
- 必要時可快速復原或比對差異
- 確保資料庫 schema 與程式碼版本對應
## Reference Documents
| 文件 | 用途 |
|------|------|
| `docs/OPENCODE_GUIDE.md` | OpenCode 使用規範 |
| `docs/ARCHITECTURE_EVALUATION.md` | 架構優化待評估項目 (含 GraphRAG) |
| `docs/PENDING_ISSUES.md` | 待解決問題追蹤 |
| `docs/MOMENTRY_CORE_MONITORING.md` | 監控系統規範 |
| `docs/MOMENTRY_CORE_REDIS_KEYS.md` | Redis Key 設計規範 |
| `docs/PYTHON.md` | Python 腳本規範 |
| `docs/FILE_CHANGE_MANAGEMENT.md` | 文件修改管理規範 |
| `docs/YOLO_RESUME_INTEGRATION.md` | YOLO Resume 功能整合記錄 |
| `docs/DOCUMENT_EMBEDDING_STRATEGY.md` | Parent-Child 嵌入策略 |
| `docs/PROCESSING_PIPELINE.md` | 處理流程文檔 |
| `docs/N8N_DEMO_WORKFLOW.md` | n8n 工作流文檔 |
| `docs/FRESH_MAC_INSTALLATION.md` | 全新 Mac 安裝指南 |
| `docs/SERVICES.md` | 服務總覽與管理 |
| `docs/SFTPGO_DEMO_USER.md` | SFTPGo 用戶指南 |
## Document Change Workflow
修改文件前請參考 `docs/FILE_CHANGE_MANAGEMENT.md`,確保:
1. **修改前**:完整閱讀文件、執行預檢清單
2. **修改中**:提供變更計畫、取得確認
3. **修改後**:展示 diff、更新版本歷史
4. **驗證**:執行 lint/test、提交前審查
### AI 工具修改規範
AI 工具修改文件時:
- 必須先完整閱讀文件(不可只讀取部分章節)
- 修改前先提出變更計畫供確認
- 修改後展示 diff 內容
- 更新版本歷史表
## PHP Development
WordPress 作為 Momentry Portal負責 n8n 自動化與 sftpgo 檔案服務的頁面整合。
### 編輯器設定
| 編輯器 | LSP 方案 | 安裝方式 |
|--------|----------|----------|
| VS Code | Intelephense | Extension Marketplace (推薦) |
| Cursor | Intelephense | Extension Marketplace (推薦) |
| CLI | phpactor | `~/bin/phpactor` |
### Intelephense (VS Code/Cursor)
1. 安裝 Extension: 搜尋 "Intelephense"
2. 設定:
```json
{
"intelephense.stubs": ["wordpress"]
}
```
### phpactor (CLI)
```bash
# 安裝方式
brew install composer
curl -sSL https://github.com/phpactor/phpactor/releases/latest/download/phpactor.phar -o ~/bin/phpactor
chmod +x ~/bin/phpactor
# 安裝 WordPress Stubs
cd /Users/accusys/wordpress/web
composer require --dev php-stubs/wordpress-stubs
# 建立 WordPress 索引
cd /Users/accusys/wordpress/web
~/bin/phpactor index:build --reset
# 常用指令
~/bin/phpactor class:search "WP_User" # 搜尋類別
~/bin/phpactor index:query WP_User # 查看類別資訊
~/bin/phpactor navigate /path/to/file.php # 導航到定義
```
### WordPress 程式碼位置
| 類型 | 路徑 |
|------|------|
| 主題 | `/Users/accusys/wordpress/web/wp-content/themes/` |
| 插件 | `/Users/accusys/wordpress/web/wp-content/plugins/` |
### 與 marcom 團隊協作
| 角色 | 負責 |
|------|------|
| marcom 團隊 | Figma 設計 / Elementor 建構 |
| OpenCode | 程式碼實作 / 重構 |
### 開發時程
```
Phase 1: marcom 建構 (現在) → Elementor 頁面建構
Phase 2: 交付審視 (TBD) → 功能確認 / 重構評估
Phase 3: OpenCode 重構 → 純程式碼實作,交付無 Elementor 依賴版本
```