docs: update docs_v1.0/ documentation

- Fix markdown lint issues (MD030, MD047, MD051, MD028, MD005) - Update AI agents, architecture, implementation docs - Add new identity, face recognition, and API documentation - Remove deprecated face/person API guides
2026-04-30 15:10:41 +08:00
parent 8f05a7c188
commit 4d75b2e251
185 changed files with 21071 additions and 1605 deletions
@@ -152,7 +152,7 @@ const job = await response.json();

 // 狀態檢查
 if (job.status === 'completed') {
-  return [{ json: { done: true, video_uuid: job.video_uuid } }];
+  return [{ json: { done: true, file_uuid: job.file_uuid } }];
 } else {
  return [{ json: { done: false, status: job.status } }];
 }
@@ -403,13 +403,13 @@ add_shortcode('momentry_search', function($atts) {
        $html .= '<ul>';

        foreach ($results['results'] as $result) {
-            $video_uuid = $result['uuid'];
+            $file_uuid = $result['uuid'];
            $start = $result['start_time'] ?? 0;
            $end = $result['end_time'] ?? 0;
            $text = $result['text'] ?? '無文字描述';
            
            $html .= '<li>';
-            $html .= '<a href="/player?uuid=' . esc_attr($video_uuid) . 
+            $html .= '<a href="/player?uuid=' . esc_attr($file_uuid) . 
                     '&start=' . esc_attr($start) . 
                     '&end=' . esc_attr($end) . '">';
            $html .= '播放 ' . $start . 's - ' . $end . 's';
@@ -220,4 +220,4 @@ ai_query_hints:

 **最後更新**: 2026-04-22  
 **卡片數量**: 5  
-**狀態分布**: ✅ 已實施 4，⚠️ 待實施 1
+**狀態分布**: ✅ 已實施 4，⚠️ 待實施 1
@@ -160,4 +160,4 @@ ai_query_hints:

 ---

-**最後更新**: 2026-04-22
+**最後更新**: 2026-04-22
@@ -386,4 +386,4 @@ jobs:

 **最後更新**：2026-04-22  
 **文檔狀態**：活躍維護中  
-**建議反饋**：請通過 GitHub Issues 或郵件提供反饋
+**建議反饋**：請通過 GitHub Issues 或郵件提供反饋
@@ -326,4 +326,4 @@ Momentry Core 是一個基於 Rust 的數字資產管理系統，專注於視頻
 | 2026-04-22 | V1.1 | 更新文檔索引，添加新創建的架構文檔 | OpenCode |
 | 2026-04-22 | V1.0 | 創建架構總覽文件 | OpenCode |

-**最後更新**: 2026-04-22 (V1.2)
+**最後更新**: 2026-04-22 (V1.2)
@@ -276,4 +276,4 @@ ai_query_hints:
 **最後更新**: 2026-04-22  
 **版本**: V1.0  
 **生效日期**: 2026-04-22  
-**審查週期**: 每季度審查更新
+**審查週期**: 每季度審查更新
@@ -39,7 +39,7 @@ ai_query_hints:

 本路線圖定義了 Momentry Core 架構發展的階段性目標和時間規劃，涵蓋從基礎架構到高級功能的全面發展。

-### 階段劃分：
+### 階段劃分

 ```
 Phase 0: 現狀 (Current State) [✅ 已實現]
@@ -226,12 +226,12 @@ Phase 3: 遠景目標 (Long-term Vision) [🔮 規劃中]

 ## 6. 關鍵里程碑

-### 2026年：
+### 2026年
 - ✅ **2026-03-25**: Rule 1 (句子級分片)完整實現
 - ⏳ **2026-05-31**: 完成 Rule 3 (場景級分片)  
 - ⏳ **2026-09-30**: 完成 Rule 2 (視覺分片)

-### 2027年：
+### 2027年
 - 📅 **2027-02-28**: 微服務架構遷移完成
 - 📅 **2027-06-30**: 實時處理引擎上線
 - 📅 **2027-12-31**: 企業級功能完整實現
@@ -240,7 +240,7 @@ Phase 3: 遠景目標 (Long-term Vision) [🔮 規劃中]

 ## 7. 風險與挑戰

-### 技術挑戰：
+### 技術挑戰

 1. **AI 模型集成**：
   - 多模型協同工作
@@ -257,7 +257,7 @@ Phase 3: 遠景目標 (Long-term Vision) [🔮 規劃中]
   - 並發控制
   - 資源調度優化

-### 非技術挑戰：
+### 非技術挑戰

 1. **資源限制**：
   - 計算資源需求
@@ -273,7 +273,7 @@ Phase 3: 遠景目標 (Long-term Vision) [🔮 規劃中]

 ## 8. 成功標準

-### 技術成功標準：
+### 技術成功標準

 1. **性能指標**：
   - API 響應時間 < 500ms
@@ -285,7 +285,7 @@ Phase 3: 遠景目標 (Long-term Vision) [🔮 規劃中]
   - AI 模型準確率 > 85%
   - 檢索結果相關性 > 80%

-### 業務成功標準：
+### 業務成功標準

 1. **用戶滿意度**：
   - 搜索結果滿意度 > 85%
@@ -301,7 +301,7 @@ Phase 3: 遠景目標 (Long-term Vision) [🔮 規劃中]

 ## 9. 監控與評估

-### 性能監控：
+### 性能監控

 1. **實時指標**：
   - API 延遲
@@ -313,7 +313,7 @@ Phase 3: 遠景目標 (Long-term Vision) [🔮 規劃中]
   - 用戶活躍度
   - 功能使用頻率

-### 評估機制：
+### 評估機制

 1. **每月評估**：
   - 進度審查
@@ -325,20 +325,11 @@ Phase 3: 遠景目標 (Long-term Vision) [🔮 規劃中]
   - 質量保證
   - 風險管理

-
-
 ---

-
-
-
 ## 10. 更新頻率

-
-
-### 路線圖更新：
-
-
+### 路線圖更新

 | 更新類型 | 頻率 | 責任人 |
 |----------|------|--------|
@@ -346,34 +337,22 @@ Phase 3: 遠景目標 (Long-term Vision) [🔮 規劃中]
 | 重大調整 | 季度 | 架構委員會 |
 | 年度規劃 | 每年 | 管理層 |

-
-
-### 溝通機制：
+### 溝通機制

 1. **內部溝通**：
   - 每周技術會議
   - 月度架構審查
   - 季度成果展示

-
-
 2. **外部溝通**：
   - 每月進度報告
   - 季度技術更新
   - 年度發展規劃

-
-
 ---

-
-
 ## 11. 相關文件

-
-
-
-
 | 文件 | 描述 | 相關性 |
 |------|------|--------|
 | [ARCHITECTURE_OVERVIEW.md](./ARCHITECTURE_OVERVIEW.md) | 架構總覽 | 整體規劃 |
@@ -381,20 +360,12 @@ Phase 3: 遠景目標 (Long-term Vision) [🔮 規劃中]
 | [CHUNKING_ARCHITECTURE.md](./chunking/CHUNKING_ARCHITECTURE.md) | 分片架構 | 技術實現 |
 | [PROJECT_DOCS_V1_INTEGRATION_PLAN.md](../PROJECT_DOCS_V1_INTEGRATION_PLAN.md) | 項目整合計劃 | 總體規劃 |

-
-
 ---

-
-
 ## 12. 最後更新記錄

-
-
 | 版本 | 日期 | 主要變更 | 操作人 |
 |------|------|----------|--------|
 | V1.0 | 2026-04-22 | 創建架構路線圖文件 | OpenCode |

-
-
-**最後更新日期**: 2026-04-22
+**最後更新日期**: 2026-04-22
@@ -0,0 +1,535 @@
+---
+document_type: "benchmark_plan"
+title: "CLIP ViT-L/14 Embedding 性能基准测试计划"
+service: "MOMENTRY_CORE"
+date: "2026-04-28"
+status: "active"
+current_state: "planning"
+owner: "Warren"
+created_by: "OpenCode"
+created_at: "2026-04-28"
+version: "V1.0"
+tags:
+  - "clip"
+  - "vit-l/14"
+  - "embedding"
+  - "benchmark"
+  - "logo_detection"
+  - "mps"
+  - "accusys_logo"
+related_documents:
+  - "IDENTITY_REFERENCE_VECTOR_DESIGN.md"
+  - "MOMENTRY_CORE_ARCHITECTURE_V2.md"
+  - "IMPLEMENTATION/FILE_IDENTITY_API_DESIGN.md"
+ai_query_hints:
+  - "查詢 CLIP ViT-L/14 性能测试计划"
+  - "查詢 Accusys Logo 测试方案"
+  - "查詢 MPS vs CPU 性能对比"
+  - "查詢 Logo 檢測 + embedding + 匹配流程"
+---
+
+# CLIP ViT-L/14 Embedding 性能基准测试计划
+
+| 項目 | 內容 |
+|------|------|
+| 建立者 | OpenCode |
+| 建立時間 | 2026-04-28 |
+| 文件版本 | V1.0 |
+
+---
+
+## 版本歷史
+
+| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
+|------|------|------|--------|-----------|
+| V1.0 | 2026-04-28 | 創建 CLIP ViT-L/14 性能基准测试计划 | OpenCode | OpenCode |
+
+---
+
+## 概述
+
+本文檔定義 Momentry Core Identity 系統的 **CLIP ViT-L/14 Embedding 性能基准测试计划**，测试对象为 **Accusys Storage Logo**。
+
+---
+
+## 测试目标
+
+### 核心目标
+
+| 目標 | 說明 |
+|------|------|
+| **Logo 檢測** | 使用 OWL-ViT 檢測 Accusys Logo 在视频中的出现 |
+| **Embedding 提取** | 使用 CLIP ViT-L/14 提取 Logo 的 768-dim embedding |
+| **Identity 注册** | 将 Logo 注册为 Identity (identity_type='logo') |
+| **相似度搜索** | 在视频帧中搜索与 Logo 相似的内容 |
+| **性能基准** | 测量 CLIP 在 MPS vs CPU 的性能差异 |
+| **1对多匹配** | 测试 1对多匹配算法的效果 |
+
+### 测试对象
+
+| 对象 | URL | 尺寸 | 说明 |
+|------|-----|------|------|
+| **Accusys Logo** | https://www.accusys.com.tw/wp-content/uploads/2023/03/Accusys-Orange-2017.png | 3269x747px | Orange 品牌色 (#EE7632) |
+
+---
+
+## 测试环境
+
+### 系统配置
+
+| 配置 | 说明 |
+|------|------|
+| **OS** | macOS (darwin) |
+| **Python** | 3.11 (MOMENTRY_PYTHON_PATH=/opt/homebrew/bin/python3.11) |
+| **PyTorch** | MPS backend support ✅ |
+| **CLIP Model** | ViT-L/14 (laion/CLIP-ViT-L-14-laion2B-s32B-b82K) |
+| **GPU** | Apple Silicon (MPS) |
+
+### 模型信息
+
+| 模型 | 参数 | 说明 |
+|------|------|------|
+| **CLIP ViT-L/14** | 768-dim embedding | 适合 logo/symbol/object 识别 |
+| **OWL-ViT** | 开放词汇检测器 | 检测任意 Logo/Symbol/Object |
+| **InsightFace ArcFace** | 512-dim embedding | 人脸识别（对比基准） |
+
+---
+
+## 测试计划
+
+### Phase 1: Logo 檢測 (OWL-ViT)
+
+**目标**: 使用 OWL-ViT 检测 Accusys Logo 在视频帧中的出现
+
+**测试步骤**:
+1. 准备测试视频（包含 Accusys Logo）
+2. 使用 OWL-ViT 检测 Logo：
+   ```python
+   from transformers import owl_vit
+   
+   # 检测文本提示
+   prompts = ["Accusys Storage Logo", "orange logo", "brand logo"]
+   
+   # 检测结果
+   detections = owl_vit.detect(video_frame, prompts)
+   ```
+3. 记录检测结果：
+   - bbox 坐标
+   - confidence score
+   - 检测速度
+
+**预期输出**:
+- Logo 检测成功率 > 90%
+- 检测速度 < 1s/frame
+
+---
+
+### Phase 2: Embedding 提取 (CLIP ViT-L/14)
+
+**目标**: 使用 CLIP ViT-L/14 提取 Logo 的 768-dim embedding
+
+**测试步骤**:
+1. 下载 Accusys Logo 图片
+2. 使用 CLIP 提取 embedding：
+   ```python
+   import torch
+   from transformers import CLIPModel, CLIPProcessor
+   
+   # 加载模型 (MPS backend)
+   device = torch.device("mps")
+   model = CLIPModel.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K").to(device)
+   processor = CLIPProcessor.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K")
+   
+   # 提取 embedding
+   image = Image.open("accusys_logo.png")
+   inputs = processor(images=image, return_tensors="pt").to(device)
+   embedding = model.get_image_features(**inputs)
+   
+   # 输出: 768-dim vector
+   print(f"Embedding shape: {embedding.shape}")  # [1, 768]
+   ```
+3. 记录提取速度：
+   - MPS 模式
+   - CPU 模式
+
+**预期输出**:
+- Embedding 提取成功
+- MPS vs CPU 性能对比
+
+---
+
+### Phase 3: Identity 注册
+
+**目标**: 将 Accusys Logo 注册为 Identity
+
+**测试步骤**:
+1. 创建 Identity:
+   ```python
+   identity = {
+       "identity_id": generate_uuid(),
+       "name": "Accusys Storage Logo",
+       "identity_type": "logo",
+       "source": "manual",
+       "reference_data": {
+           "identity_embeddings": [
+               {
+                   "embedding": embedding.tolist(),
+                   "source": "logo_image",
+                   "image_url": "https://www.accusys.com.tw/wp-content/uploads/2023/03/Accusys-Orange-2017.png",
+                   "context": "brand_logo",
+                   "created_at": datetime.now().isoformat()
+               }
+           ],
+           "image_urls": ["https://www.accusys.com.tw/wp-content/uploads/2023/03/Accusys-Orange-2017.png"]
+       },
+       "identity_embedding": embedding.tolist()
+   }
+   ```
+2. 存储到 identities 表
+3. 验证存储成功
+
+**预期输出**:
+- Identity 注册成功
+- reference_data JSONB 结构正确
+- identity_embedding VECTOR(768) 存储正确
+
+---
+
+### Phase 4: 相似度搜索
+
+**目标**: 在视频帧中搜索与 Logo 相似的内容
+
+**测试步骤**:
+1. 提取视频帧的 CLIP embedding
+2. 计算与 Identity 的相似度：
+   ```python
+   def search_similar_frames(video_frames, identity_embedding):
+       results = []
+       for frame in video_frames:
+           # 提取帧 embedding
+           frame_embedding = clip_model.extract_embedding(frame)
+           
+           # 计算相似度
+           similarity = cosine_similarity(frame_embedding, identity_embedding)
+           
+           if similarity >= 0.85:
+               results.append({
+                   "frame": frame,
+                   "similarity": similarity
+               })
+       return results
+   ```
+3. 测试 1对多匹配算法：
+   - Strategy 1: Best Match
+   - Strategy 2: Voting
+   - Strategy 3: Weighted Average
+   - Strategy 4: Combined
+
+**预期输出**:
+- 相似度搜索成功率
+- 匹配算法对比
+
+---
+
+### Phase 5: 性能基准测试
+
+**目标**: 测量 CLIP 在 MPS vs CPU 的性能差异
+
+**测试步骤**:
+1. **MPS 模式性能测试**:
+   ```python
+   device = torch.device("mps")
+   model = CLIPModel.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K").to(device)
+   
+   # 测试 1000 次提取
+   start_time = time.time()
+   for i in range(1000):
+       embedding = model.get_image_features(**inputs)
+   mps_time = time.time() - start_time
+   ```
+2. **CPU 模式性能测试**:
+   ```python
+   device = torch.device("cpu")
+   model = CLIPModel.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K").to(device)
+   
+   # 测试 1000 次提取
+   start_time = time.time()
+   for i in range(1000):
+       embedding = model.get_image_features(**inputs)
+   cpu_time = time.time() - start_time
+   ```
+3. **对比分析**:
+   - 提取速度 (mps_time vs cpu_time)
+   - 内存使用
+   - GPU 使用率
+
+**预期输出**:
+- MPS 性能提升倍数
+- CPU fallback 性能基准
+- 推荐使用场景
+
+---
+
+### Phase 6: 与 ArcFace 对比
+
+**目标**: 对比 CLIP ViT-L/14 与 ArcFace 的性能差异
+
+**测试对象**:
+- **CLIP ViT-L/14**: Logo/Symbol/Object 识别 (768-dim)
+- **ArcFace**: 人脸识别 (512-dim)
+
+**测试步骤**:
+1. 使用相同测试集（包含人脸和 Logo）
+2. 测量两种模型的：
+   - Embedding 提取速度
+   - 匹配准确率
+   - 匹配速度
+3. 对比分析
+
+**预期输出**:
+| 模型 | 用途 | 维度 | 提取速度 | 匹配准确率 |
+|------|------|------|----------|-----------|
+| CLIP ViT-L/14 | Logo/Symbol/Object | 768 | TBD | TBD |
+| ArcFace | 人脸识别 | 512 | TBD | TBD |
+
+---
+
+## 测试脚本
+
+### scripts/clip_benchmark_test.py
+
+```python
+"""
+CLIP ViT-L/14 性能基准测试脚本
+
+测试内容:
+1. Logo 檢測 (OWL-ViT)
+2. Embedding 提取 (CLIP ViT-L/14)
+3. Identity 注册
+4. 相似度搜索
+5. MPS vs CPU 性能对比
+6. 与 ArcFace 对比
+"""
+
+import torch
+import time
+import numpy as np
+from PIL import Image
+from transformers import CLIPModel, CLIPProcessor
+
+def test_clip_embedding_extraction():
+    """Phase 2: Embedding 提取测试"""
+    
+    # 加载模型
+    device_mps = torch.device("mps")
+    device_cpu = torch.device("cpu")
+    
+    model_mps = CLIPModel.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K").to(device_mps)
+    model_cpu = CLIPModel.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K").to(device_cpu)
+    
+    processor = CLIPProcessor.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K")
+    
+    # 加载 Accusys Logo
+    image = Image.open("accusys_logo.png")
+    
+    # MPS 测试
+    inputs_mps = processor(images=image, return_tensors="pt").to(device_mps)
+    start_time = time.time()
+    for i in range(100):
+        embedding_mps = model_mps.get_image_features(**inputs_mps)
+    mps_time = time.time() - start_time
+    
+    # CPU 测试
+    inputs_cpu = processor(images=image, return_tensors="pt").to(device_cpu)
+    start_time = time.time()
+    for i in range(100):
+        embedding_cpu = model_cpu.get_image_features(**inputs_cpu)
+    cpu_time = time.time() - start_time
+    
+    # 输出结果
+    print(f"MPS 提取速度: {mps_time/100:.4f} s/image")
+    print(f"CPU 提取速度: {cpu_time/100:.4f} s/image")
+    print(f"MPS 性能提升: {cpu_time/mps_time:.2f}x")
+    print(f"Embedding shape: {embedding_mps.shape}")
+    
+    return {
+        "mps_time": mps_time/100,
+        "cpu_time": cpu_time/100,
+        "mps_speedup": cpu_time/mps_time,
+        "embedding_shape": embedding_mps.shape
+    }
+
+def test_similarity_search(identity_embedding, test_frames):
+    """Phase 4: 相似度搜索测试"""
+    
+    device = torch.device("mps")
+    model = CLIPModel.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K").to(device)
+    processor = CLIPProcessor.from_pretrained("laion/CLIP-ViT-L-14-laion2B-s32B-b82K")
+    
+    results = []
+    for frame in test_frames:
+        inputs = processor(images=frame, return_tensors="pt").to(device)
+        frame_embedding = model.get_image_features(**inputs)
+        
+        similarity = cosine_similarity(frame_embedding, identity_embedding)
+        
+        if similarity >= 0.85:
+            results.append({
+                "frame": frame,
+                "similarity": similarity
+            })
+    
+    return results
+
+def cosine_similarity(a, b):
+    """计算余弦相似度"""
+    a = a.detach().cpu().numpy().flatten()
+    b = np.array(b).flatten()
+    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
+
+if __name__ == "__main__":
+    print("=== CLIP ViT-L/14 性能基准测试 ===")
+    
+    # Phase 2: Embedding 提取
+    print("\n=== Phase 2: Embedding 提取测试 ===")
+    result = test_clip_embedding_extraction()
+    
+    # Phase 3: Identity 注册 (需要数据库连接)
+    print("\n=== Phase 3: Identity 注册 ===")
+    print("待實作: 需要資料庫連接")
+    
+    # Phase 4: 相似度搜索 (需要测试帧)
+    print("\n=== Phase 4: 相似度搜索 ===")
+    print("待實作: 需要测试帧")
+    
+    print("\n=== 测试完成 ===")
+```
+
+---
+
+## 测试数据
+
+### Accusys Logo 信息
+
+| 属性 | 值 |
+|------|-----|
+| **Logo URL** | https://www.accusys.com.tw/wp-content/uploads/2023/03/Accusys-Orange-2017.png |
+| **尺寸** | 3269x747px |
+| **品牌色** | Orange (#EE7632) |
+| **公司** | Accusys Storage |
+| **产品线** | ExaSAN Series, Gamma Series, T-Share Series |
+| **Momentry Studio** | 网站首页有介绍（AI Video Search） |
+
+### 测试视频需求
+
+| 需求 | 说明 |
+|------|------|
+| **包含 Logo** | 视频中需包含 Accusys Logo |
+| **不同场景** | 白底、黑底、复杂背景 |
+| **不同大小** | 大、中、小 Logo |
+| **不同角度** | 正面、侧面、倾斜 |
+| **时长** | 建议 30-60 秒 |
+
+---
+
+## 预期结果
+
+### 性能基准预期
+
+| 指标 | 预期值 | 说明 |
+|------|--------|------|
+| **MPS 提取速度** | < 0.05 s/image | MPS 加速 |
+| **CPU 提取速度** | < 0.2 s/image | CPU fallback |
+| **MPS 性能提升** | > 2x | MPS vs CPU |
+| **Logo 检测成功率** | > 90% | OWL-ViT 检测 |
+| **匹配准确率** | > 85% | 相似度搜索 |
+| **匹配速度** | < 1s/query | 相似度计算 |
+
+### 1对多匹配预期
+
+| 算法 | 预期准确率 | 说明 |
+|------|-----------|------|
+| **Strategy 1 (Best Match)** | 85% | 快速匹配 |
+| **Strategy 2 (Voting)** | 88% | 投票机制 |
+| **Strategy 3 (Weighted)** | 90% | 加权平均 |
+| **Strategy 4 (Combined)** | 92% | 综合评分 |
+
+---
+
+## 实作计划
+
+### Phase 1: 准备测试环境
+
+- [ ] 下载 Accusys Logo 图片
+- [ ] 准备测试视频
+- [ ] 安装 CLIP ViT-L/14 模型
+- [ ] 安装 OWL-ViT 模型
+
+### Phase 2: Logo 檢測测试
+
+- [ ] OWL-ViT 检测脚本编写
+- [ ] 检测结果记录
+- [ ] 检测速度测量
+
+### Phase 3: Embedding 提取测试
+
+- [ ] CLIP ViT-L/14 embedding 提取脚本编写
+- [ ] MPS vs CPU 性能对比
+- [ ] Embedding 存储测试
+
+### Phase 4: Identity 注册测试
+
+- [ ] Identity 注册脚本编写
+- [ ] reference_data JSONB 存储测试
+- [ ] identity_embedding VECTOR(768) 存储测试
+
+### Phase 5: 相似度搜索测试
+
+- [ ] 相似度搜索脚本编写
+- [ ] 1对多匹配算法测试
+- [ ] 搜索结果记录
+
+### Phase 6: 性能基准测试
+
+- [ ] MPS vs CPU 性能对比脚本
+- [ ] 1000 次提取测试
+- [ ] 性能基准报告生成
+
+---
+
+## 待辦事項
+
+| 項目 | 優先級 | 說明 |
+|------|--------|------|
+| 准备测试环境 | 高 | Phase 1 |
+| Logo 檢測测试 | 高 | Phase 2 |
+| Embedding 提取测试 | 高 | Phase 3 |
+| Identity 注册测试 | 中 | Phase 4 |
+| 相似度搜索测试 | 中 | Phase 5 |
+| 性能基准测试 | 中 | Phase 6 |
+
+---
+
+## 限制條件
+
+- CLIP ViT-L/14 需要 MPS 或 CUDA 支持
+- OWL-ViT 需要 Transformers 库
+- 测试视频需包含 Accusys Logo
+- 需要 PostgreSQL + pgvector 支持
+
+---
+
+## 相关文件
+
+- `docs_v1.0/ARCHITECTURE/IDENTITY_REFERENCE_VECTOR_DESIGN.md` - 1对多参考向量设计
+- `docs_v1.0/ARCHITECTURE/MOMENTRY_CORE_ARCHITECTURE_V2.md` - 核心架构设计
+- `docs_v1.0/IMPLEMENTATION/FILE_IDENTITY_API_DESIGN.md` - API 设计
+- `scripts/fast_stamp_search.py` - OWL-ViT Logo 检测脚本（已集成）
+
+---
+
+## 版本信息
+
+- 版本: V1.0
+- 建立日期: 2026-04-28
+- 文件更新: 2026-04-28
@@ -345,4 +345,4 @@ ASR → OCR → YOLO → CUT → 分片生成
 3. **持續優化**：建立長期機制確保設計與實現的一致性
 4. **用戶為中心**：以實際用戶需求為導向調整設計

-**核心原則重申**：在出現矛盾時，實際的 Rust 代碼實現是最高權威，設計文檔應反映實際實現狀態並指導未來改進方向。
+**核心原則重申**：在出現矛盾時，實際的 Rust 代碼實現是最高權威，設計文檔應反映實際實現狀態並指導未來改進方向。
@@ -915,4 +915,4 @@ python3 scripts/test_action_recognition.py video.mp4
 | **運動** | ST-GCN + YOLO | 88-92% | 20s/10min |
 | **打架** | ST-GCN | 80-85% | 15s/10min |
 | **吵架** | 多模態 | 85-90% | 60s/10min |
-| **細粒度動作** | SlowFast | 90-95% | 100s/10min |
+| **細粒度動作** | SlowFast | 90-95% | 100s/10min |
@@ -435,4 +435,4 @@ cargo run --bin momentry_playground -- server
 **最後更新**: 2026-04-22  
 **文檔版本**: V1.0  
 **更新頻率**: 每月審查更新  
-**維護者**: OpenCode
+**維護者**: OpenCode
@@ -0,0 +1,573 @@
+---
+document_type: "architecture"
+title: "Identity 1對多參考向量設計"
+service: "MOMENTRY_CORE"
+date: "2026-04-28"
+status: "active"
+current_state: "finalized"
+owner: "Warren"
+created_by: "OpenCode"
+created_at: "2026-04-28"
+version: "V1.0"
+tags:
+  - "identity"
+  - "reference_vector"
+  - "embedding"
+  - "face_embedding"
+  - "identity_embedding"
+  - "1-to-many"
+  - "matching_algorithm"
+related_documents:
+  - "MOMENTRY_CORE_ARCHITECTURE_V2.md"
+  - "IMPLEMENTATION/FILE_IDENTITY_API_DESIGN.md"
+  - "CLIP_EMBEDDING_BENCHMARK_PLAN.md"
+ai_query_hints:
+  - "查詢 1對多參考向量架構設計"
+  - "查詢 reference_data JSONB 結構"
+  - "查詢多角度人臉 embedding 存儲"
+  - "查詢 Logo/Symbol identity_embedding"
+  - "查詢匹配算法 (最佳匹配/投票/加權平均)"
+---
+
+# Identity 1對多參考向量設計
+
+| 項目 | 內容 |
+|------|------|
+| 建立者 | OpenCode |
+| 建立時間 | 2026-04-28 |
+| 文件版本 | V1.0 |
+
+---
+
+## 版本歷史
+
+| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
+|------|------|------|--------|-----------|
+| V1.0 | 2026-04-28 | 創建 Identity 1對多參考向量架構設計 | OpenCode | OpenCode |
+
+---
+
+## 概述
+
+本文檔定義 Momentry Core Identity 系統的 **1對多參考向量架構設計**，核心理念：
+**同一 Identity 可存儲多個參考向量（不同角度、不同場景、不同版本），提高識別鲁棒性。**
+
+---
+
+## 核心設計理念
+
+### 問題背景
+
+**傳統 1對1 設計的局限**：
+- 單一參考向量無法覆蓋不同角度（正面、側面、背面）
+- 單一參考向量無法覆蓋不同場景（白底 Logo、黑底 Logo、複雜背景 Logo）
+- 單一參考向量無法覆蓋不同版本（同一演員的不同定妝造型）
+- 匹配失敗率高，鲁棒性不足
+
+### 1對多設計優勢
+
+| 優勢 | 說明 |
+|------|------|
+| **多角度覆蓋** | 人臉正面、側面、三側角度，覆蓋不同拍攝角度 |
+| **多場景覆蓋** | Logo/Symbol 在不同背景下的 embedding |
+| **多版本覆蓋** | 同一演員的不同定妝造型（老妝、武俠造型、現代造型） |
+| **質量評分** | 每個參考向量記錄質量評分，用於加權匹配 |
+| **來源追溯** | 記錄每個 embedding 的來源，方便更新和追溯 |
+
+---
+
+## 架構設計
+
+### 資料庫 Schema
+
+**identities 表核心字段**:
+
+```sql
+CREATE TABLE identities (
+    identity_id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    name                TEXT NOT NULL,
+    identity_type       VARCHAR(30) NOT NULL,
+    
+    -- 參考向量 (centroid 或最佳代表)
+    face_embedding      VECTOR(512),             -- ArcFace centroid
+    voice_embedding     VECTOR(192),             -- ECAPA-TDNN centroid
+    identity_embedding  VECTOR(768),             -- CLIP ViT-L/14 centroid
+    
+    -- 1對多參考向量存儲
+    reference_data      JSONB DEFAULT '{}',      -- 多角度/多場景/多版本
+    
+    created_at          TIMESTAMPTZ DEFAULT NOW(),
+    updated_at          TIMESTAMPTZ DEFAULT NOW()
+);
+```
+
+**設計理念**:
+- `face_embedding` 等 VECTOR 字段存儲 **centroid**（中心向量）或最佳代表向量
+- `reference_data` JSONB 存儲 **所有參考向量**（多角度、多場景、多版本）
+- 匹配時可選擇：
+  - **快速匹配**: 使用 centroid（適合低延遲場景）
+  - **鲁棒匹配**: 使用 reference_data 進行 1對多匹配（適合高精度場景）
+
+---
+
+## reference_data JSONB 結構
+
+### 完整結構
+
+```json
+{
+  "face_embeddings": [
+    {
+      "embedding": [0.1, 0.2, ...],
+      "source": "tmdb_images",
+      "image_url": "https://image.tmdb.org/t/p/original/xxx.jpg",
+      "angle": "frontal",
+      "quality_score": 0.95,
+      "created_at": "2026-04-28T10:00:00Z"
+    },
+    {
+      "embedding": [0.3, 0.4, ...],
+      "source": "tmdb_images",
+      "image_url": "https://image.tmdb.org/t/p/original/yyy.jpg",
+      "angle": "profile_left",
+      "quality_score": 0.88,
+      "created_at": "2026-04-28T10:05:00Z"
+    }
+  ],
+  "voice_embeddings": [
+    {
+      "embedding": [0.1, 0.2, ...],
+      "source": "video_segment",
+      "file_uuid": "vid_001",
+      "timestamp_start": 120.5,
+      "timestamp_end": 135.2,
+      "quality_score": 0.88,
+      "created_at": "2026-04-28T11:00:00Z"
+    }
+  ],
+  "identity_embeddings": [
+    {
+      "embedding": [0.1, 0.2, ...],
+      "source": "logo_image",
+      "image_url": "https://www.accusys.com.tw/wp-content/uploads/2023/03/Accusys-Orange-2017.png",
+      "context": "brand_logo",
+      "created_at": "2026-04-28T12:00:00Z"
+    }
+  ],
+  "sound_embeddings": [
+    {
+      "embedding": [0.1, 0.2, ...],
+      "source": "audio_segment",
+      "file_uuid": "vid_001",
+      "timestamp_start": 10.0,
+      "timestamp_end": 15.0,
+      "sound_type": "animal_dog_bark",
+      "created_at": "2026-04-28T13:00:00Z"
+    }
+  ],
+  "image_urls": [
+    "https://image.tmdb.org/t/p/original/xxx.jpg",
+    "https://www.accusys.com.tw/wp-content/uploads/2023/03/Accusys-Orange-2017.png"
+  ]
+}
+```
+
+### 欄位說明
+
+#### face_embeddings (人臉向量)
+
+| 欄位 | 類型 | 必填 | 說明 |
+|------|------|------|------|
+| embedding | Array[512] | Yes | 512-dim ArcFace 向量 |
+| source | String | Yes | 來源: tmdb_profile, tmdb_images, manual_upload, auto_detection |
+| image_url | String | Yes | 圖片 URL |
+| angle | String | No | 人臉角度: frontal, profile_left, profile_right, three_quarter |
+| quality_score | Float | No | 質量評分 (0.0-1.0) |
+| created_at | String | Yes | 建立時間 (ISO 8601) |
+
+#### voice_embeddings (聲紋向量)
+
+| 欄位 | 類型 | 必填 | 說明 |
+|------|------|------|------|
+| embedding | Array[192] | Yes | 192-dim ECAPA-TDNN 向量 |
+| source | String | Yes | 來源: video_segment, audio_file |
+| file_uuid | String | Yes | 檔案 UUID |
+| timestamp_start | Float | Yes | 開始時間 (秒) |
+| timestamp_end | Float | Yes | 結束時間 (秒) |
+| quality_score | Float | No | 質量評分 (0.0-1.0) |
+| created_at | String | Yes | 建立時間 (ISO 8601) |
+
+#### identity_embeddings (身份向量 - Logo/Symbol/Object)
+
+| 欄位 | 類型 | 必填 | 說明 |
+|------|------|------|------|
+| embedding | Array[768] | Yes | 768-dim CLIP ViT-L/14 向量 |
+| source | String | Yes | 來源: logo_image, symbol_image, object_image, concept_image |
+| image_url | String | Yes | 圖片 URL |
+| context | String | No | 識別場景: brand_logo, symbol, object, concept |
+| created_at | String | Yes | 建立時間 (ISO 8601) |
+
+#### sound_embeddings (聲音向量 - Phase 5+)
+
+| 欄位 | 類型 | 必填 | 說明 |
+|------|------|------|------|
+| embedding | Array[TBD] | Yes | TBD (動物叫聲、雷雨、槍炮、樂器) |
+| source | String | Yes | 來源: audio_segment |
+| file_uuid | String | Yes | 檔案 UUID |
+| timestamp_start | Float | Yes | 開始時間 (秒) |
+| timestamp_end | Float | Yes | 結束時間 (秒) |
+| sound_type | String | Yes | 聲音類型: animal_dog_bark, environmental_thunder, weapon_gunshot, musical_guitar |
+| created_at | String | Yes | 建立時間 (ISO 8601) |
+
+---
+
+## 匹配算法
+
+### 1對多匹配策略
+
+#### 策略 1: 最佳匹配 (Best Match)
+
+```python
+def best_match(detected_embedding, reference_embeddings):
+    """
+    策略 1: 取所有參考向量中的最高相似度
+    
+    適用場景:
+    - 快速匹配
+    - 低延遲需求
+    """
+    similarities = [
+        cosine_similarity(detected_embedding, ref["embedding"])
+        for ref in reference_embeddings
+    ]
+    return max(similarities)
+```
+
+#### 策略 2: 投票機制 (Voting)
+
+```python
+def voting_match(detected_embedding, reference_embeddings, threshold=0.85):
+    """
+    策略 2: 統計超過閾值的參考向量數量
+    
+    適用場景:
+    - 高鲁棒性需求
+    - 多角度覆蓋場景
+    """
+    similarities = [
+        cosine_similarity(detected_embedding, ref["embedding"])
+        for ref in reference_embeddings
+    ]
+    
+    votes = sum(1 for sim in similarities if sim >= threshold)
+    vote_ratio = votes / len(similarities)
+    
+    return {
+        "votes": votes,
+        "vote_ratio": vote_ratio,
+        "is_match": vote_ratio >= 0.5  # 至少一半參考向量支持
+    }
+```
+
+#### 策略 3: 加權平均 (Weighted Average)
+
+```python
+def weighted_match(detected_embedding, reference_embeddings):
+    """
+    策略 3: 根據質量評分加權計算相似度
+    
+    適用場景:
+    - 參考向量質量不均
+    - 需要考慮質量評分
+    """
+    similarities = [
+        cosine_similarity(detected_embedding, ref["embedding"])
+        for ref in reference_embeddings
+    ]
+    
+    weights = [
+        ref.get("quality_score", 1.0)
+        for ref in reference_embeddings
+    ]
+    
+    weighted_sim = sum(sim * w for sim, w in zip(similarities, weights)) / sum(weights)
+    
+    return {
+        "weighted_similarity": weighted_sim,
+        "is_match": weighted_sim >= 0.85
+    }
+```
+
+#### 策略 4: 綜合評分 (Combined)
+
+```python
+def combined_match(detected_embedding, reference_embeddings, threshold=0.85):
+    """
+    策略 4: 綜合評分 (最佳匹配 + 投票 + 加權平均)
+    
+    適用場景:
+    - 最高精度需求
+    - 重要場景識別
+    """
+    best_match_score = best_match(detected_embedding, reference_embeddings)
+    voting_result = voting_match(detected_embedding, reference_embeddings, threshold)
+    weighted_result = weighted_match(detected_embedding, reference_embeddings)
+    
+    # 綜合評分: 50% 最佳匹配 + 30% 投票比率 + 20% 加權平均
+    final_score = (
+        best_match_score * 0.5 +
+        voting_result["vote_ratio"] * 0.3 +
+        weighted_result["weighted_similarity"] * 0.2
+    )
+    
+    return {
+        "best_match": best_match_score,
+        "vote_ratio": voting_result["vote_ratio"],
+        "weighted_similarity": weighted_result["weighted_similarity"],
+        "final_score": final_score,
+        "is_match": final_score >= threshold
+    }
+```
+
+### 匹配算法選擇建議
+
+| 場景 | 推薦策略 | 說明 |
+|------|---------|------|
+| **實時搜索** | Strategy 1 (Best Match) | 低延遲，快速匹配 |
+| **批量處理** | Strategy 4 (Combined) | 最高精度，綜合評分 |
+| **低置信度場景** | Strategy 2 (Voting) | 投票機制，提高鲁棒性 |
+| **質量不均場景** | Strategy 3 (Weighted) | 加權平均，考慮質量評分 |
+
+---
+
+## TMDB 整合流程
+
+### 1對多參考向量提取
+
+```python
+def tmdb_identity_integration(tmdb_person_id, identity_name):
+    """
+    TMDB 整合流程:
+    1. 下載多張人臉照片 (TMDB /person/:id/images 端點)
+    2. 提取每張照片的 ArcFace embedding
+    3. 存儲到 reference_data JSONB
+    4. 計算 centroid 存儲到 face_embedding
+    """
+    
+    # Step 1: 獲取 TMDB 人物照片列表
+    images = tmdb_api.get_person_images(tmdb_person_id)
+    
+    # Step 2: 下載並提取 embedding
+    face_embeddings = []
+    for image in images:
+        # 下載圖片
+        image_url = f"https://image.tmdb.org/t/p/original/{image['file_path']}"
+        image_data = download_image(image_url)
+        
+        # 提取 ArcFace embedding
+        embedding = insightface.extract_embedding(image_data)
+        
+        # 評估人臉角度和質量
+        angle = detect_face_angle(image_data)
+        quality_score = evaluate_face_quality(image_data)
+        
+        # 存儲到 reference_data
+        face_embeddings.append({
+            "embedding": embedding.tolist(),
+            "source": "tmdb_images",
+            "image_url": image_url,
+            "angle": angle,
+            "quality_score": quality_score,
+            "created_at": datetime.now().isoformat()
+        })
+    
+    # Step 3: 存儲到 identities 表
+    identity = {
+        "identity_id": generate_uuid(),
+        "name": identity_name,
+        "identity_type": "people",
+        "source": "tmdb",
+        "tmdb_id": tmdb_person_id,
+        "reference_data": {
+            "face_embeddings": face_embeddings,
+            "image_urls": [img["image_url"] for img in face_embeddings]
+        }
+    }
+    
+    # Step 4: 計算 centroid
+    centroid = calculate_centroid([e["embedding"] for e in face_embeddings])
+    identity["face_embedding"] = centroid
+    
+    # 存儲到資料庫
+    db.insert_identity(identity)
+    
+    return identity
+```
+
+### Centroid 計算
+
+```python
+def calculate_centroid(embeddings):
+    """
+    計算多個 embedding 的中心向量
+    
+    方法: 平均值
+    """
+    import numpy as np
+    
+    embeddings_array = np.array(embeddings)
+    centroid = np.mean(embeddings_array, axis=0)
+    
+    return centroid.tolist()
+```
+
+---
+
+## Logo/Symbol Identity 整合
+
+### CLIP ViT-L/14 Embedding 提取
+
+```python
+def logo_identity_integration(logo_name, logo_url):
+    """
+    Logo Identity 整合流程:
+    1. 下載 Logo 圖片
+    2. 提取 CLIP ViT-L/14 embedding (768-dim)
+    3. 存儲到 reference_data JSONB
+    4. 存儲到 identity_embedding 字段
+    """
+    
+    # Step 1: 下載圖片
+    image_data = download_image(logo_url)
+    
+    # Step 2: 提取 CLIP embedding
+    embedding = clip_model.extract_embedding(image_data)
+    
+    # Step 3: 存儲到 reference_data
+    identity_embedding_data = {
+        "embedding": embedding.tolist(),
+        "source": "logo_image",
+        "image_url": logo_url,
+        "context": "brand_logo",
+        "created_at": datetime.now().isoformat()
+    }
+    
+    # Step 4: 存儲到 identities 表
+    identity = {
+        "identity_id": generate_uuid(),
+        "name": logo_name,
+        "identity_type": "logo",
+        "source": "manual",
+        "reference_data": {
+            "identity_embeddings": [identity_embedding_data],
+            "image_urls": [logo_url]
+        },
+        "identity_embedding": embedding.tolist()
+    }
+    
+    # 存儲到資料庫
+    db.insert_identity(identity)
+    
+    return identity
+```
+
+### 範例: Accusys Logo
+
+```python
+# 註冊 Accusys Logo Identity
+accusys_logo = logo_identity_integration(
+    logo_name="Accusys Storage Logo",
+    logo_url="https://www.accusys.com.tw/wp-content/uploads/2023/03/Accusys-Orange-2017.png"
+)
+
+# 測試匹配
+detected_logo_embedding = clip_model.extract_embedding(video_frame)
+match_result = combined_match(
+    detected_embedding=detected_logo_embedding,
+    reference_embeddings=accusys_logo["reference_data"]["identity_embeddings"],
+    threshold=0.85
+)
+
+print(f"Match result: {match_result['is_match']}")
+print(f"Final score: {match_result['final_score']}")
+```
+
+---
+
+## 實作計畫
+
+### Phase 1: 資料庫 Migration
+
+- [ ] Migration 023: identities 表添加 reference_data JSONB + identity_embedding VECTOR(768)
+- [ ] 索引配置: identity_embedding 向量索引 (ivfflat 或 hnsw)
+- [ ] 測試資料建立
+
+### Phase 2: TMDB 整合實作
+
+- [ ] TMDB /person/:id/images API 串接
+- [ ] 多張照片下載邏輯
+- [ ] ArcFace embedding 提取（多角度）
+- [ ] reference_data JSONB 存儲
+- [ ] Centroid 計算邏輯
+
+### Phase 3: Logo/Symbol Identity 實作
+
+- [ ] CLIP ViT-L/14 模型集成（MPS 支持）
+- [ ] Logo/Symbol 檢測（OWL-ViT）
+- [ ] identity_embedding 提取
+- [ ] reference_data JSONB 存儲
+- [ ] 匹配算法實作
+
+### Phase 4: 匹配算法實作
+
+- [ ] Strategy 1: Best Match
+- [ ] Strategy 2: Voting
+- [ ] Strategy 3: Weighted Average
+- [ ] Strategy 4: Combined
+- [ ] API 端點設計
+
+### Phase 5: 声音识别扩展 (待辦事項)
+
+- [ ] sound_embeddings 定義
+- [ ] 動物叫聲 embedding 提取
+- [ ] 雷雨聲 embedding 提取
+- [ ] 槍炮聲 embedding 提取
+- [ ] 樂器聲 embedding 提取
+
+---
+
+## 待辦事項
+
+| 項目 | 優先級 | 說明 |
+|------|--------|------|
+| Migration 023 | 高 | Phase 1 |
+| TMDB 整合實作 | 高 | Phase 2 |
+| Logo/Symbol Identity | 中 | Phase 3 |
+| 匹配算法實作 | 中 | Phase 4 |
+| 声音识别扩展 | 低 | Phase 5+ (待辦事項) |
+
+---
+
+## 限制條件
+
+- 本設計為全新架構，需要資料庫 Migration
+- CLIP ViT-L/14 需要 MPS 或 CUDA 支持
+- TMDB 整合需要 TMDB API Key
+- 声音识别列为 Phase 5+ 待辦事項
+
+---
+
+## 相關文件
+
+- `docs_v1.0/ARCHITECTURE/MOMENTRY_CORE_ARCHITECTURE_V2.md` - 核心架構設計
+- `docs_v1.0/IMPLEMENTATION/FILE_IDENTITY_API_DESIGN.md` - API 設計
+- `docs_v1.0/ARCHITECTURE/CLIP_EMBEDDING_BENCHMARK_PLAN.md` - CLIP 测试计划
+- `docs_v1.0/STANDARDS/DOCS_STANDARD.md` - 文件創建規範
+
+---
+
+## 版本資訊
+
+- 版本: V1.0
+- 建立日期: 2026-04-28
+- 文件更新: 2026-04-28
@@ -2,18 +2,20 @@
 document_type: "architecture_design"
 service: "MOMENTRY_CORE"
 title: "Job Worker 實作計畫"
-date: "2026-03-24"
-version: "V1.0"
+date: "2026-04-27"
+version: "V1.2"
 status: "active"
 owner: "Warren"
 created_by: "OpenCode"
 tags:
  - "實作計畫"
  - "worker"
+  - "processing_status"
 ai_query_hints:
  - "查詢 Job Worker 實作計畫 的內容"
  - "Job Worker 實作計畫 的主要目的是什麼？"
  - "如何操作或實施 Job Worker 實作計畫？"
+  - "processing_status 字段設計"
 ---

 # Job Worker 實作計畫
@@ -22,7 +24,7 @@ ai_query_hints:
 |------|------|
 | 建立者 | Warren / OpenCode |
 | 建立時間 | 2026-03-24 |
-| 文件版本 | V1.1 |
+| 文件版本 | V1.2 |
 | 狀態 | ✅ 已實作 |

 ---
@@ -33,6 +35,7 @@ ai_query_hints:
 |------|------|------|--------|
 | V1.0 | 2026-03-24 | 建立實作計畫 | OpenCode |
 | V1.1 | 2026-03-25 | 實作完成，更新狀態 | OpenCode |
+| V1.2 | 2026-04-27 | 添加 processing_status 字段設計說明 | OpenCode |

 ---

@@ -689,6 +692,117 @@ export REDIS_URL=redis://:accusys@localhost:6379
 | `completed` | 所有處理完成 |
 | `failed` | 處理失敗 |

+### B.1 videos 表 processing_status 欄位
+
+| 值 | 說明 | 適用場景 |
+|------|------|----------|
+| `REGISTERED` | 已註冊 | 新註冊的視頻，尚未觸發處理 |
+| `PENDING` | 等待處理 | 已觸發處理，等待作業分配 |
+| `PROBING` | 探測中 | ffprobe 分析執行中 |
+| `ASR` | ASR 處理中 | ASR 作業執行中 |
+| `OCR` | OCR 處理中 | OCR 作業執行中 |
+| `YOLO` | YOLO 處理中 | YOLO 作業執行中 |
+| `FACE` | 人臉偵測中 | Face 作業執行中 |
+| `POSE` | 姿態估計中 | Pose 作業執行中 |
+| `CUT` | 分塊處理中 | Cut 作業執行中 |
+| `ASRX` | 說話者分離中 | ASRX 作業執行中 |
+| `COMPLETED` | 完成 | 所有處理完成 |
+| `FAILED` | 失敗 | 處理失敗 |
+| `PAUSED` | 暫停 | 斷點續傳暫停狀態 |
+| `RESUMING` | 恢復中 | 斷點續傳恢復中 |
+
+#### B.1.1 status 與 processing_status 的關係
+
+| status | processing_status | 說明 |
+|--------|-------------------|------|
+| `pending` | `REGISTERED` | 新註冊，Portal顯示「已註冊」（藍色） |
+| `processing` | `PENDING` | 已觸發，Portal顯示「等待處理」（黃色） |
+| `processing` | `PROBING`/`ASR`/... | 各處理器執行中，Portal顯示處理器名稱（靛藍） |
+| `completed` | `COMPLETED` | 完成，Portal顯示「已完成」（綠色） |
+| `failed` | `FAILED` | 失敗，Portal顯示「處理失敗」（紅色） |
+
+#### B.1.2 Portal顯示優先級
+
+Portal 優先使用 `processing_status`（詳細狀態），Fallback 使用 `status`（基本狀態）。
+
+#### B.1.3 processing_status JSONB 結構（V1.2 起）
+
+從 V1.2 起，`processing_status` 改為 **JSONB** 格式，支持多層級進度追蹤。
+
+詳細規範請參考: `REFERENCE/PROCESSING_STATUS_JSONB_SPEC.md`
+
+##### JSONB 主要字段
+
+| 字段 | 類型 | 說明 |
+|------|------|------|
+| `phase` | String | 當前階段（PROCESSING, COMPLETED, FAILED） |
+| `active_processors` | Array[String] | 正在執行的處理器列表（大寫） |
+| `total_frames` | Integer | 影片總帧數 |
+| `processing_summary` | Object | 處理器完成狀態總覽 |
+| `pre_chunks_summary` | Object | pre_chunks 表絕計（按處理器） |
+| `chunks_summary` | Object | chunks 表絕計（按 Rule） |
+| `agents` | Object | Agent 任務狀態（5W1H, Translation） |
+| `vectorization_summary` | Object | 向量化絕計 |
+| `progress` | Object | 各處理器詳細進度 |
+
+##### JSONB 範例（處理中）
+
+```json
+{
+  "phase": "PROCESSING",
+  "active_processors": ["YOLO", "OCR"],
+  "total_frames": 412343,
+  "progress": {
+    "YOLO": {
+      "current_frame": 25000,
+      "percentage": 6.0,
+      "status": "running"
+    }
+  }
+}
+```
+
+##### JSONB 範例（完成）
+
+```json
+{
+  "phase": "COMPLETED",
+  "active_processors": [],
+  "pre_chunks_summary": {
+    "total_records": 25000,
+    "by_processor": {
+      "asr": {"records": 1466},
+      "yolo": {"records": 11000}
+    }
+  },
+  "chunks_summary": {
+    "total_chunks": 2798,
+    "by_rule": {
+      "rule_1": {"chunks_count": 1466},
+      "rule_3": {"chunks_count": 1332}
+    }
+  },
+  "agents": {
+    "5w1h": {"status": "completed"}
+  }
+}
+```
+
+##### SQL 查詢範例
+
+```sql
+-- 取得 phase
+SELECT processing_status->>'phase' FROM videos WHERE uuid = 'xxx';
+
+-- 取得 active_processors
+SELECT processing_status->'active_processors' FROM videos WHERE uuid = 'xxx';
+
+-- 取得 pre_chunks 絕計
+SELECT processing_status->'pre_chunks_summary'->>'total_records' FROM videos;
+```
+
+---
+
 ### C. processor_results 表 status 欄位

 | 值 | 說明 |
@@ -546,4 +546,4 @@ switch_mcp status
  
 年度節省:
  $216 USD ✅
-```
+```
@@ -442,4 +442,4 @@ tests/
 *版本: 1.0.0*
 *創建日期: 2026-03-27*
 *負責人: Warren (Technical Lead)*
-*狀態: 審核中*
+*狀態: 審核中*
@@ -36,14 +36,18 @@ Identity ──[出現在]──→ File

 任何可命名的事物都是 Identity：

-| 類型 | 說明 | 範例 |
-|------|------|------|
-| people | 人 | 演員、公眾人物、虛構角色 |
-| object | 物件 | 車輛、建築、道具 |
-| brand | 品牌 | LV、Hello Kitty、Nike |
-| logo | 商標 | LV logo、Nike 勾勾 |
-| concept | 概念 | 愛、自由、科技 |
-| scene | 場景 | 室內、室外、街道 |
+| 類型 | 說明 | 範例 | 參考向量 |
+|------|------|------|----------|
+| people | 人 | 演員、公眾人物、虛構角色 | face_embedding (512), voice_embedding (192) |
+| logo | 商標 | LV logo、Nike 勾勾、Accusys Logo | identity_embedding (768) |
+| symbol | 符號 | 交通標誌、品牌符號 | identity_embedding (768) |
+| object | 物件 | 車輛、建築、道具 | identity_embedding (768) |
+| brand | 品牌 | LV、Hello Kitty、Nike | identity_embedding (768) |
+| concept | 概念 | 愛、自由、科技 | identity_embedding (768) |
+| scene | 場景 | 室內、室外、街道 | identity_embedding (768) |
+| sound | 聲音 | 動物叫聲、雷雨、槍炮、樂器 | sound_embedding (TBD) |
+| animal | 動物 | 狗、貓、鳥 | identity_embedding (768) + sound_embedding (TBD) |
+| environmental | 環境音 | 雨聲、風聲、海浪 | sound_embedding (TBD) |

 ### 2.2 People Identity 特殊設計

@@ -87,12 +91,68 @@ CREATE TABLE identities (
    -- 參考向量 (用於自動比對)
    face_embedding  VECTOR(512),             -- 參考臉向量 (ArcFace)
    voice_embedding VECTOR(192),             -- 參考聲紋向量 (ECAPA-TDNN)
+    identity_embedding VECTOR(768),          -- 身份向量 (CLIP ViT-L/14) 用於 logo/symbol/object
+    
+    -- 1對多參考向量存儲 (多角度/多場景/多版本)
+    reference_data  JSONB,                   -- 存儲多個 embedding，結構見下方說明
    
    created_at      TIMESTAMPTZ DEFAULT NOW(),
    updated_at      TIMESTAMPTZ DEFAULT NOW()
 );
 ```

+#### reference_data JSONB 結構
+
+```json
+{
+  "face_embeddings": [
+    {
+      "embedding": [0.1, 0.2, ...],          // 512-dim ArcFace
+      "source": "tmdb_profile",              // tmdb_profile, tmdb_images, manual_upload, auto_detection
+      "image_url": "https://...",             // 來源圖片 URL
+      "angle": "frontal",                    // frontal, profile_left, profile_right, three_quarter
+      "quality_score": 0.95,                 // 人臉質量評分
+      "created_at": "2026-04-28T10:00:00Z"
+    }
+  ],
+  "voice_embeddings": [
+    {
+      "embedding": [0.1, 0.2, ...],          // 192-dim ECAPA-TDNN
+      "source": "video_segment",
+      "file_uuid": "xxx",
+      "timestamp_start": 120.5,
+      "timestamp_end": 135.2,
+      "quality_score": 0.88,
+      "created_at": "2026-04-28T10:00:00Z"
+    }
+  ],
+  "identity_embeddings": [
+    {
+      "embedding": [0.1, 0.2, ...],          // 768-dim CLIP ViT-L/14
+      "source": "logo_image",                // logo_image, symbol_image, object_image
+      "image_url": "https://...",
+      "context": "brand_logo",               // brand_logo, symbol, object, concept
+      "created_at": "2026-04-28T10:00:00Z"
+    }
+  ],
+  "sound_embeddings": [
+    {
+      "embedding": [0.1, 0.2, ...],          // TBD (動物、雷雨、槍炮、樂器)
+      "source": "audio_segment",
+      "file_uuid": "xxx",
+      "timestamp_start": 10.0,
+      "timestamp_end": 15.0,
+      "sound_type": "animal_dog_bark",       // animal_dog_bark, environmental_thunder, weapon_gunshot, musical_guitar
+      "created_at": "2026-04-28T10:00:00Z"
+    }
+  ],
+  "image_urls": [
+    "https://www.accusys.com.tw/wp-content/uploads/2023/03/Accusys-Orange-2017.png",
+    "https://image.tmdb.org/t/p/original/xxx.jpg"
+  ]
+}
+```
+
 ---

 ## 3. File 設計
@@ -270,23 +330,92 @@ TMDB API → 電影資訊 + 演員名單 → 自動建立 Identity → 關聯到
   - 系統自動從 TMDB API 獲取：
     - 演員名單 + 角色名
     - 演員人臉照 (profile_path)
+     - 演員多張照片 (TMDB /person/:id/images 端點)
     - 電影元數據

 2. **建立 Identity**：
   - 自動建立或更新 Identity（演員）
-   - 儲存 TMDB ID + 人臉照 URL
+   - 儲存 TMDB ID + 多張人臉照 URL
   - 關聯到 File（這部電影）

-3. **提取參考向量**：
-   - 下載 TMDB 人臉照
-   - 提取 face_embedding (512-dim)
-   - 儲存到 identities 表
+3. **提取參考向量 (1對多)**：
+   - 下載 TMDB 多張人臉照 (不同角度、定妝造型)
+   - 對每張照片提取 face_embedding (512-dim ArcFace)
+   - 將多個 embedding 存儲到 reference_data JSONB：
+     ```json
+     {
+       "face_embeddings": [
+         {
+           "embedding": [...],
+           "source": "tmdb_images",
+           "image_url": "https://image.tmdb.org/t/p/original/xxx.jpg",
+           "angle": "frontal",
+           "quality_score": 0.95
+         },
+         {
+           "embedding": [...],
+           "source": "tmdb_images",
+           "image_url": "https://image.tmdb.org/t/p/original/yyy.jpg",
+           "angle": "profile_left",
+           "quality_score": 0.88
+         }
+       ]
+     }
+     ```
+   - 計算 centroid（中心向量）存儲到 face_embedding 字段

 4. **後續 AI 識別**：
   - 系統檢測 File 中的 Face
-   - 自動匹配到已有的 Identity
+   - 自動匹配到已有的 Identity（使用 1對多匹配算法）
   - 更新 file_identities 表

+#### 6.2.1 1對多匹配算法
+
+```python
+def match_face_to_identity(detected_embedding, identity_reference_data):
+    """
+    1對多匹配：檢測到的臉與 Identity 的多個參考向量比對
+    
+    策略：
+    1. 最佳匹配：取所有參考向量中的最高相似度
+    2. 投票機制：統計超過閾值的參考向量數量
+    3. 加權平均：根據質量評分加權計算相似度
+    """
+    face_embeddings = identity_reference_data.get("face_embeddings", [])
+    
+    if not face_embeddings:
+        return None
+    
+    # 策略 1: 最佳匹配
+    similarities = [
+        cosine_similarity(detected_embedding, ref["embedding"])
+        for ref in face_embeddings
+    ]
+    best_match = max(similarities)
+    
+    # 策略 2: 投票機制
+    threshold = 0.85
+    votes = sum(1 for sim in similarities if sim >= threshold)
+    vote_ratio = votes / len(similarities)
+    
+    # 策略 3: 加權平均
+    weighted_sim = sum(
+        sim * ref.get("quality_score", 1.0)
+        for sim, ref in zip(similarities, face_embeddings)
+    ) / sum(ref.get("quality_score", 1.0) for ref in face_embeddings)
+    
+    # 綜合評分
+    final_score = (best_match * 0.5 + vote_ratio * 0.3 + weighted_sim * 0.2)
+    
+    return {
+        "best_match": best_match,
+        "vote_ratio": vote_ratio,
+        "weighted_sim": weighted_sim,
+        "final_score": final_score,
+        "is_match": final_score >= threshold
+    }
+```
+
 ### 6.3 TMDB API 端點

 | 端點 | 說明 |
@@ -539,3 +668,4 @@ GET /api/v1/identities/search?q=張&type=people&category=P-001
 | 版本 | 日期 | 目的 | 操作人 |
 |------|------|------|--------|
 | V1.0 | 2026-04-25 | 全新設計 (File + Identity + Category) | OpenCode |
+| V1.1 | 2026-04-28 | 添加 identity_embedding (768維 CLIP)、reference_data JSONB (1對多參考向量)、擴展 identity_type (logo/symbol/sound/animal/environmental)、TMDB 多角度人臉整合 | OpenCode |
@@ -389,4 +389,4 @@ Momentry Core 的監控架構設計提供：
 3. **數據驅動**：基於數據的決策與優化
 4. **持續改進**：不斷優化監控策略與工具

-通過完善的監控體系，確保系統穩定運行，快速發現並解決問題，為用戶提供高質量的服務。
+通過完善的監控體系，確保系統穩定運行，快速發現並解決問題，為用戶提供高質量的服務。
@@ -189,4 +189,4 @@ async fn metrics_handler() -> impl IntoResponse {
 ---

 **最後更新**: 2026-04-22  
-**部署時間**: 10-30 分鐘
+**部署時間**: 10-30 分鐘
@@ -130,8 +130,8 @@ graph TD
 ### 3.1 Chunk 定義 (Video Chunk)
 **定義**: 特定視頻文件 (`uuid`) 內，從 `start_frame` 到 `end_frame` 之間的**連續畫面**。
 **存儲**:
-*   **PostgreSQL**: 權威主數據 (Metadata, Relations, Complex Queries).
-*   **Qdrant**: 向量檢索與 Payload 過濾 (Fast Retrieval).
+* **PostgreSQL**: 權威主數據 (Metadata, Relations, Complex Queries).
+* **Qdrant**: 向量檢索與 Payload 過濾 (Fast Retrieval).

 ### 3.2 數據庫 Schema (PostgreSQL)

@@ -201,7 +201,7 @@ CREATE TABLE talents (
 -- 劇中角色庫 (Character)
 CREATE TABLE characters (
    id              BIGSERIAL PRIMARY KEY,
-    video_uuid      TEXT NOT NULL,
+    file_uuid      TEXT NOT NULL,
    name            TEXT NOT NULL,                -- 角色名
    language_track  TEXT DEFAULT 'original',      -- 語言軌道 (dub_zh_tw, dub_en)
    is_voice_only   BOOLEAN DEFAULT FALSE,        -- 無臉角色 (動畫/旁白/AI)
@@ -229,7 +229,7 @@ CREATE TABLE identity_bindings (

 ```json
 {
-  "uuid": "384b0ff44aaaa1f1",
+  "uuid": "384b0ff44aaaa1f14cb2cd63b3fea966",
  "chunk_id": "chunk_001",
  "start_frame": 100,
  "end_frame": 200,
@@ -255,48 +255,48 @@ CREATE TABLE identity_bindings (
 ## 4. 搜尋維度 (5W1H + Context + Sports)

 ### 4.1 人 (Person / Who)
-*   **身份解析**: `speaker_X` / `face_Y` -> `talent` -> `character`.
-*   **屬性過濾**: 性別、年齡、體型、五官、服裝 (VLM/Heuristics).
-*   **聲紋檢索**: 上傳音頻片段 -> Cosine Similarity (ECAPA-TDNN 192-dim).
+* **身份解析**: `speaker_X` / `face_Y` -> `talent` -> `character`.
+* **屬性過濾**: 性別、年齡、體型、五官、服裝 (VLM/Heuristics).
+* **聲紋檢索**: 上傳音頻片段 -> Cosine Similarity (ECAPA-TDNN 192-dim).

 ### 4.2 事 (Event / What)
-*   **語音語義**: ASR 文本向量檢索.
-*   **視覺行為**: Pose Analyzer 標籤 (打架、擁抱、揮手).
-*   **融合事件**: `gunfight`, `romantic_scene`, `interview` (多信號規則融合).
+* **語音語義**: ASR 文本向量檢索.
+* **視覺行為**: Pose Analyzer 標籤 (打架、擁抱、揮手).
+* **融合事件**: `gunfight`, `romantic_scene`, `interview` (多信號規則融合).

 ### 4.3 時 (Time / When)
-*   **精確幀**: `start_frame`, `end_frame`.
-*   **相對時間**: "最後 5 分鐘".
+* **精確幀**: `start_frame`, `end_frame`.
+* **相對時間**: "最後 5 分鐘".

 ### 4.4 地 (Location / Where)
-*   **場景語義**: Places365 -> 宏觀/語義/原始三層映射 (e.g., `beach` -> `outdoor`).
-*   **天氣/環境**: `rainy`, `sunny`, `night` (Context Inference).
+* **場景語義**: Places365 -> 宏觀/語義/原始三層映射 (e.g., `beach` -> `outdoor`).
+* **天氣/環境**: `rainy`, `sunny`, `night` (Context Inference).

 ### 4.5 物 (Object / Which)
-*   **YOLO 物件**: `car`, `gun`, `dog`.
-*   **音頻物件**: `siren`, `barking`.
+* **YOLO 物件**: `car`, `gun`, `dog`.
+* **音頻物件**: `siren`, `barking`.

 ### 4.6 上下文 (Context)
-*   **季節**: `winter` (雪/圍巾), `summer` (泳衣/太陽眼鏡).
-*   **節慶**: `christmas` (聖誕樹/鈴鐺聲), `cny` (鞭炮/紅燈籠).
+* **季節**: `winter` (雪/圍巾), `summer` (泳衣/太陽眼鏡).
+* **節慶**: `christmas` (聖誕樹/鈴鐺聲), `cny` (鞭炮/紅燈籠).

 ### 4.7 運動 (Sports)
-*   **球類**: 棒球 (球棒/打擊聲/揮棒), 籃球 (運球聲/投籃), 足球 (哨音/踢球).
-*   **水上/冰上運動 (詳細特徵)**:
-    *   **🏊 游泳 (Swimming)**:
-        *   *場景*: `swimming_pool`, `water`.
-        *   *物件*: `goggles`, `swim_cap`, `lane_rope`.
-        *   *動作*: `freestyle_stroke` (自由式), `breaststroke` (蛙式), `butterfly` (蝶式), `backstroke` (仰式).
-        *   *音頻*: `water_splash` (水花聲), `rhythmic_breathing` (規律換氣聲).
-    *   **🤿 跳水 (Diving)**:
-        *   *場景*: `diving_board`, `platform_10m`.
-        *   *動作序列*: `takeoff` (起跳) → `aerial_twist` (空中翻轉) → `entry` (入水).
-        *   *音頻*: `high_pitch_whistle` (哨音) → `massive_splash` (巨大入水聲).
-    *   **⛸️ 滑冰 (Ice Skating)**:
-        *   *場景*: `ice_rink`, `winter`.
-        *   *物件*: `ice_skates`, `barrier`.
-        *   *動作*: `gliding` (滑行), `spinning` (旋轉), `jumping` (跳躍).
-        *   *音頻*: `blade_on_ice` (冰刀摩擦聲), `classical_music` (花滑配樂).
+* **球類**: 棒球 (球棒/打擊聲/揮棒), 籃球 (運球聲/投籃), 足球 (哨音/踢球).
+* **水上/冰上運動 (詳細特徵)**:
+  * **🏊 游泳 (Swimming)**:
+    * *場景*: `swimming_pool`, `water`.
+    * *物件*: `goggles`, `swim_cap`, `lane_rope`.
+    * *動作*: `freestyle_stroke` (自由式), `breaststroke` (蛙式), `butterfly` (蝶式), `backstroke` (仰式).
+    * *音頻*: `water_splash` (水花聲), `rhythmic_breathing` (規律換氣聲).
+  * **🤿 跳水 (Diving)**:
+    * *場景*: `diving_board`, `platform_10m`.
+    * *動作序列*: `takeoff` (起跳) → `aerial_twist` (空中翻轉) → `entry` (入水).
+    * *音頻*: `high_pitch_whistle` (哨音) → `massive_splash` (巨大入水聲).
+  * **⛸️ 滑冰 (Ice Skating)**:
+    * *場景*: `ice_rink`, `winter`.
+    * *物件*: `ice_skates`, `barrier`.
+    * *動作*: `gliding` (滑行), `spinning` (旋轉), `jumping` (跳躍).
+    * *音頻*: `blade_on_ice` (冰刀摩擦聲), `classical_music` (花滑配樂).

 ---

@@ -328,53 +328,53 @@ CREATE TABLE identity_bindings (

 ### 5.3 混合查詢 (Hybrid Query)

-1.  **解析身份 (Who)**:
-    *   查詢 `identity_bindings`，找到符合 "穿西裝男人" 的機器 ID (`face_5`).
-2.  **構建 SQL (PostgreSQL)**:
+1. **解析身份 (Who)**:
+    * 查詢 `identity_bindings`，找到符合 "穿西裝男人" 的機器 ID (`face_5`).
+2. **構建 SQL (PostgreSQL)**:
    ```sql
    SELECT chunk_id, start_frame, end_frame FROM chunks
-    WHERE uuid = '384b0ff44aaaa1f1'
+    WHERE uuid = '384b0ff44aaaa1f14cb2cd63b3fea966'
    AND 'face_5' = ANY(face_ids)
    AND scene_semantic @> ARRAY['office']
    AND action_tags @> ARRAY['arguing', 'shouting']
    AND audio_events @> ARRAY['dog_bark'];
    ```
-3.  **構建 Vector Search (Qdrant)**:
-    *   如果 SQL 結果為空或用戶語意模糊，切換至 Qdrant Payload Filter + Vector Similarity.
-4.  **返回結果**:
-    *   Chunk 列表，包含精確的 `start_frame`, `end_frame`.
+3. **構建 Vector Search (Qdrant)**:
+    * 如果 SQL 結果為空或用戶語意模糊，切換至 Qdrant Payload Filter + Vector Similarity.
+4. **返回結果**:
+    * Chunk 列表，包含精確的 `start_frame`, `end_frame`.

 ---

 ## 6. 實施路線圖 (Implementation Roadmap)

 ### Phase 1: 基礎設施與 Schema (第 1 週)
- [ ] 執行 PostgreSQL Schema V5 更新 (Chunks, Talents, Castings, Bindings, Sports).
- [ ] 建立 Qdrant Collection (`momentry_chunks`)，配置 Multi-Vector 和 Payload 索引.
- [ ] 編寫 `scene_hierarchy_processor.py` (場景映射層).
- [ ] 編寫 `scene_mapping.json`.
+* [ ] 執行 PostgreSQL Schema V5 更新 (Chunks, Talents, Castings, Bindings, Sports).
+* [ ] 建立 Qdrant Collection (`momentry_chunks`)，配置 Multi-Vector 和 Payload 索引.
+* [ ] 編寫 `scene_hierarchy_processor.py` (場景映射層).
+* [ ] 編寫 `scene_mapping.json`.

 ### Phase 2: 信號提取模組 (第 2-3 週)
- [ ] 部署 `audio_event_processor.py` (PANNs/YAMNet).
- [ ] 部署 `pose_analyzer_processor.py` (基礎規則：站/坐/揮手/打鬥/泳姿).
- [ ] 部署 `context_inference_processor.py` (季節/節慶/天氣推斷).
- [ ] 部署 `sports_classifier_processor.py` (運動分類規則引擎).
- [ ] 確保所有處理器的輸出能正確映射並寫入 `chunks` 表.
+* [ ] 部署 `audio_event_processor.py` (PANNs/YAMNet).
+* [ ] 部署 `pose_analyzer_processor.py` (基礎規則：站/坐/揮手/打鬥/泳姿).
+* [ ] 部署 `context_inference_processor.py` (季節/節慶/天氣推斷).
+* [ ] 部署 `sports_classifier_processor.py` (運動分類規則引擎).
+* [ ] 確保所有處理器的輸出能正確映射並寫入 `chunks` 表.

 ### Phase 3: 身份綁定系統 (第 4 週)
- [ ] 部署 `voice_embedding_extractor.py` (聲紋提取與比對).
- [ ] 實現 `identity_resolver.py`：將機器 ID 綁定到 `talents` 和 `characters`.
- [ ] 提供 API: `POST /api/v1/person/bind`.
+* [ ] 部署 `voice_embedding_extractor.py` (聲紋提取與比對).
+* [ ] 實現 `identity_resolver.py`：將機器 ID 綁定到 `talents` 和 `characters`.
+* [ ] 提供 API: `POST /api/v1/person/bind`.

 ### Phase 4: 搜尋引擎整合 (第 5 週)
- [ ] 開發 `search_processor.py` (LLM Parser + SQL Builder).
- [ ] 實現 `POST /api/v1/search/smart` 端點.
- [ ] 測試複雜查詢 (人+事+時+地+物+上下文+運動).
+* [ ] 開發 `search_processor.py` (LLM Parser + SQL Builder).
+* [ ] 實現 `POST /api/v1/search/smart` 端點.
+* [ ] 測試複雜查詢 (人+事+時+地+物+上下文+運動).

 ### Phase 5: 優化與前端對接 (第 6 週)
- [ ] 性能優化 (索引調整、查詢緩存).
- [ ] 前端搜尋介面展示多維度過濾條件.
- [ ] 前端視頻播放器跳轉至精確 `start_frame`.
+* [ ] 性能優化 (索引調整、查詢緩存).
+* [ ] 前端搜尋介面展示多維度過濾條件.
+* [ ] 前端視頻播放器跳轉至精確 `start_frame`.

 ---

@@ -434,24 +434,24 @@ class ParallelScheduler:
        self.max_workers = max_workers
        self.executor = concurrent.futures.ThreadPoolExecutor(max_workers)
    
-    async def schedule_processing(self, video_uuid):
+    async def schedule_processing(self, file_uuid):
        """調度處理任務"""
        # Phase 1: 上傳時即時處理
        fast_tasks = [
-            self.executor.submit(self.run_scene, video_uuid),
-            self.executor.submit(self.run_face, video_uuid),
-            self.executor.submit(self.run_cut, video_uuid)
+            self.executor.submit(self.run_scene, file_uuid),
+            self.executor.submit(self.run_face, file_uuid),
+            self.executor.submit(self.run_cut, file_uuid)
        ]
        
        # 等待上傳完成
-        await self.wait_for_upload_complete(video_uuid)
+        await self.wait_for_upload_complete(file_uuid)
        
        # Phase 2: 上傳完成後處理
        slow_tasks = [
-            self.executor.submit(self.run_asr, video_uuid),
-            self.executor.submit(self.run_ocr, video_uuid),
-            self.executor.submit(self.run_yolo, video_uuid),
-            self.executor.submit(self.run_pose, video_uuid)
+            self.executor.submit(self.run_asr, file_uuid),
+            self.executor.submit(self.run_ocr, file_uuid),
+            self.executor.submit(self.run_yolo, file_uuid),
+            self.executor.submit(self.run_pose, file_uuid)
        ]
        
        # 收集結果
@@ -488,11 +488,11 @@ from fastapi import WebSocket
 class ProgressWebSocket:
    """即時進度推送"""
    
-    async def broadcast_progress(self, video_uuid, processor, progress):
+    async def broadcast_progress(self, file_uuid, processor, progress):
        """廣播處理進度"""
        message = {
            "type": "progress",
-            "video_uuid": video_uuid,
+            "file_uuid": file_uuid,
            "processor": processor,
            "progress": progress,
            "timestamp": time.time()
@@ -500,11 +500,11 @@ class ProgressWebSocket:
        
        await self.websocket.send_json(message)
    
-    async def broadcast_result(self, video_uuid, processor, result):
+    async def broadcast_result(self, file_uuid, processor, result):
        """廣播處理結果"""
        message = {
            "type": "result",
-            "video_uuid": video_uuid,
+            "file_uuid": file_uuid,
            "processor": processor,
            "result": result,
            "timestamp": time.time()
@@ -607,20 +607,20 @@ class PriorityProcessor:
        "low": ["pose"]                               # 可選
    }
    
-    async def process_by_priority(self, video_uuid):
+    async def process_by_priority(self, file_uuid):
        # 高優先級：立即處理
        for processor in self.PRIORITY["high"]:
-            await self.run(processor, video_uuid)
+            await self.run(processor, file_uuid)
        
        # 中優先級：並行處理
        await asyncio.gather(*[
-            self.run(p, video_uuid)
+            self.run(p, file_uuid)
            for p in self.PRIORITY["medium"]
        ])
        
        # 低優先級：背景處理
        for processor in self.PRIORITY["low"]:
-            asyncio.create_task(self.run(processor, video_uuid))
+            asyncio.create_task(self.run(processor, file_uuid))
 ```

 ### 3. 快取預載入
@@ -706,4 +706,4 @@ class PreloadManager:
  - 上傳期間: 快速結果即時顯示
  - 上傳完成: 1-3 分鐘後完整結果
  - 用戶體驗: 良好（有即時反饋）
-```
+```
@@ -1,6 +1,6 @@
 # Parent Chunk 覆蓋率分析

-> **日期**: 2026-04-14 | **影片 UUID**: 384b0ff44aaaa1f1
+> **日期**: 2026-04-14 | **影片 UUID**: 384b0ff44aaaa1f14cb2cd63b3fea966

 ---

@@ -300,4 +300,4 @@ Momentry Core 的效能與可擴展性設計遵循以下原則：
 3. **數據驅動**：建立完整的監控體系，基於實際數據進行決策
 4. **平衡策略**：在效能、成本、複雜度之間找到最佳平衡點

-通過實施上述策略，Momentry Core 能夠支持從小型部署到大型企業級應用的各種場景，提供穩定、高效、可擴展的視頻內容分析服務。
+通過實施上述策略，Momentry Core 能夠支持從小型部署到大型企業級應用的各種場景，提供穩定、高效、可擴展的視頻內容分析服務。
@@ -34,7 +34,7 @@
 │  ├─ face_id (外键)                           │
 │  ├─ speaker_id (字符串)                      │
 │  ├─ confidence (关联置信度)                   │
-│  └─ video_uuid (来源视频)                     │
+│  └─ file_uuid (来源视频)                     │
 └─────────────────────────────────────────────┘
    ↓
 ┌─────────────────────────────────────────────┐
@@ -67,7 +67,7 @@ CREATE TABLE person_identities (
    speaker_id VARCHAR(64),  -- SPEAKER_00, SPEAKER_01, etc.
    
    -- 关联信息
-    video_uuid VARCHAR(255) NOT NULL,
+    file_uuid VARCHAR(255) NOT NULL,
    confidence DOUBLE PRECISION DEFAULT 0.0,
    
    -- 元数据
@@ -86,10 +86,10 @@ CREATE TABLE person_identities (
    is_confirmed BOOLEAN DEFAULT FALSE,  -- 用户确认的身份
    
    -- 约束
-    CONSTRAINT unique_person_identity UNIQUE (video_uuid, face_identity_id, speaker_id)
+    CONSTRAINT unique_person_identity UNIQUE (file_uuid, face_identity_id, speaker_id)
 );

-CREATE INDEX idx_person_identities_video_uuid ON person_identities(video_uuid);
+CREATE INDEX idx_person_identities_file_uuid ON person_identities(file_uuid);
 CREATE INDEX idx_person_identities_face ON person_identities(face_identity_id);
 CREATE INDEX idx_person_identities_speaker ON person_identities(speaker_id);
 CREATE INDEX idx_person_identities_name ON person_identities(name);
@@ -103,7 +103,7 @@ CREATE TABLE person_appearances (
    person_id VARCHAR(255) NOT NULL REFERENCES person_identities(person_id) ON DELETE CASCADE,
    
    -- 出场信息
-    video_uuid VARCHAR(255) NOT NULL,
+    file_uuid VARCHAR(255) NOT NULL,
    start_time DOUBLE PRECISION NOT NULL,
    end_time DOUBLE PRECISION NOT NULL,
    duration DOUBLE PRECISION NOT NULL,
@@ -120,8 +120,8 @@ CREATE TABLE person_appearances (
 );

 CREATE INDEX idx_person_appearances_person ON person_appearances(person_id);
-CREATE INDEX idx_person_appearances_video ON person_appearances(video_uuid);
-CREATE INDEX idx_person_appearances_time ON person_appearances(video_uuid, start_time, end_time);
+CREATE INDEX idx_person_appearances_video ON person_appearances(file_uuid);
+CREATE INDEX idx_person_appearances_time ON person_appearances(file_uuid, start_time, end_time);
 ```

 ### 3. 增强 chunks 表
@@ -300,7 +300,7 @@ POST /api/v1/person/identify
 Content-Type: application/json

 {
-  "video_uuid": "abc123",
+  "file_uuid": "abc123",
  "auto_match": true,
  "match_threshold": 0.5
 }
@@ -325,7 +325,7 @@ Response:
 ### 2. 查询人物出场时间轴

 ```http
-GET /api/v1/person/:person_id/timeline?video_uuid=abc123
+GET /api/v1/person/:person_id/timeline?file_uuid=abc123

 Response:
 {
@@ -471,12 +471,12 @@ pub async fn batch_insert_person_appearances(
    for appearance in appearances {
        sqlx::query(r#"
            INSERT INTO person_appearances (
-                person_id, video_uuid, start_time, end_time, 
+                person_id, file_uuid, start_time, end_time, 
                duration, confidence, metadata
            ) VALUES ($1, $2, $3, $4, $5, $6, $7)
        "#)
        .bind(&appearance.person_id)
-        .bind(&appearance.video_uuid)
+        .bind(&appearance.file_uuid)
        .bind(appearance.start_time)
        .bind(appearance.end_time)
        .bind(appearance.duration)
@@ -496,13 +496,13 @@ pub async fn batch_insert_person_appearances(
 ```sql
 -- 为常用查询添加复合索引
 CREATE INDEX idx_person_appearances_video_time 
-ON person_appearances(video_uuid, start_time, end_time);
+ON person_appearances(file_uuid, start_time, end_time);

 CREATE INDEX idx_person_identities_video_face 
-ON person_identities(video_uuid, face_identity_id);
+ON person_identities(file_uuid, face_identity_id);

 CREATE INDEX idx_person_identities_video_speaker 
-ON person_identities(video_uuid, speaker_id);
+ON person_identities(file_uuid, speaker_id);
 ```

 ### 3. 缓存策略
@@ -512,9 +512,9 @@ ON person_identities(video_uuid, speaker_id);
 pub async fn get_person_timeline_cached(
    redis: &RedisClient,
    person_id: &str,
-    video_uuid: &str,
+    file_uuid: &str,
 ) -> Result<Vec<PersonAppearance>> {
-    let cache_key = format!("person_timeline:{}:{}", video_uuid, person_id);
+    let cache_key = format!("person_timeline:{}:{}", file_uuid, person_id);
    
    // 尝试从缓存获取
    if let Some(cached) = redis.get(&cache_key).await? {
@@ -522,7 +522,7 @@ pub async fn get_person_timeline_cached(
    }
    
    // 从数据库查询
-    let timeline = query_person_timeline_from_db(person_id, video_uuid).await?;
+    let timeline = query_person_timeline_from_db(person_id, file_uuid).await?;
    
    // 缓存结果（5分钟）
    redis.set_ex(&cache_key, &serde_json::to_string(&timeline)?, 300).await?;
@@ -552,8 +552,8 @@ if confidence < MIN_MATCH_CONFIDENCE {
 // 检查是否已存在相同关联
 let existing = sqlx::query!(
    "SELECT id FROM person_identities 
-     WHERE video_uuid = $1 AND face_identity_id = $2 AND speaker_id = $3",
-    video_uuid, face_id, speaker_id
+     WHERE file_uuid = $1 AND face_identity_id = $2 AND speaker_id = $3",
+    file_uuid, face_id, speaker_id
 )
 .fetch_optional(db.pool())
 .await?;
@@ -616,4 +616,4 @@ lazy_static! {
 - [InsightFace Documentation](https://github.com/deepinsight/insightface)
 - [WhisperX Speaker Diarization](https://github.com/m-bain/whisperX)
 - [PostgreSQL pgvector](https://github.com/pgvector/pgvector)
- [DBSCAN Clustering Algorithm](https://scikit-learn.org/stable/modules/clustering.html#dbscan)
+- [DBSCAN Clustering Algorithm](https://scikit-learn.org/stable/modules/clustering.html#dbscan)
@@ -31,7 +31,7 @@ curl -X POST http://localhost:3002/api/v1/person/identify \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your_api_key" \
  -d '{
-    "video_uuid": "your_video_uuid",
+    "file_uuid": "your_file_uuid",
    "auto_match": true,
    "match_threshold": 0.5
  }'
@@ -60,7 +60,7 @@ curl -X POST http://localhost:3002/api/v1/person/identify \
 查询某个人物在视频中的出场时间：

 ```bash
-curl -X GET "http://localhost:3002/api/v1/person/person_abc123/timeline?video_uuid=your_video_uuid" \
+curl -X GET "http://localhost:3002/api/v1/person/person_abc123/timeline?file_uuid=your_file_uuid" \
  -H "X-API-Key: your_api_key"
 ```

@@ -152,7 +152,7 @@ curl -X GET http://localhost:3002/api/v1/chunks/sentence_0012/persons \
 | person_id | VARCHAR(255) | 人物唯一标识 |
 | face_identity_id | INTEGER | 关联的人脸身份 ID |
 | speaker_id | VARCHAR(64) | 说话人 ID（SPEAKER_00, SPEAKER_01...） |
-| video_uuid | VARCHAR(255) | 来源视频 UUID |
+| file_uuid | VARCHAR(255) | 来源视频 UUID |
 | name | VARCHAR(255) | 人物姓名（手动标注） |
 | confidence | DOUBLE PRECISION | 关联置信度 |
 | appearance_count | INTEGER | 出场次数 |
@@ -164,7 +164,7 @@ curl -X GET http://localhost:3002/api/v1/chunks/sentence_0012/persons \
 | 字段 | 类型 | 描述 |
 |------|------|------|
 | person_id | VARCHAR(255) | 关联的人物身份 ID |
-| video_uuid | VARCHAR(255) | 视频 UUID |
+| file_uuid | VARCHAR(255) | 视频 UUID |
 | start_time | DOUBLE PRECISION | 开始时间（秒） |
 | end_time | DOUBLE PRECISION | 结束时间（秒） |
 | duration | DOUBLE PRECISION | 持续时间（秒） |
@@ -225,11 +225,11 @@ const MIN_CONFIDENCE: f64 = 0.6;
 ```sql
 -- 时间范围查询
 CREATE INDEX idx_person_appearances_time 
-ON person_appearances(video_uuid, start_time, end_time);
+ON person_appearances(file_uuid, start_time, end_time);

 -- 人物查询
-CREATE INDEX idx_person_identities_video_uuid 
-ON person_identities(video_uuid);
+CREATE INDEX idx_person_identities_file_uuid 
+ON person_identities(file_uuid);

 -- 说话人查询
 CREATE INDEX idx_person_identities_speaker 
@@ -259,7 +259,7 @@ for video in /path/to/videos/*.mp4; do
  curl -X POST http://localhost:3002/api/v1/person/identify \
    -H "Content-Type: application/json" \
    -H "X-API-Key: your_api_key" \
-    -d "{\"video_uuid\": \"$uuid\", \"auto_match\": true}"
+    -d "{\"file_uuid\": \"$uuid\", \"auto_match\": true}"
 done
 ```

@@ -289,7 +289,7 @@ curl -X PATCH http://localhost:3002/api/v1/person/person_xxx \
 ```bash
 curl -X POST http://localhost:3002/api/v1/person/identify \
  -H "Content-Type: application/json" \
-  -d '{"video_uuid": "xxx", "match_threshold": 0.3}'
+  -d '{"file_uuid": "xxx", "match_threshold": 0.3}'
 ```

 ### 问题 2：人物身份重复
@@ -313,7 +313,7 @@ SELECT merge_person_identities(
 **解决**：
 1. 确认索引已创建：`\d person_appearances`
 2. 使用 EXPLAIN 分析查询
-3. 考虑分区表（按 video_uuid）
+3. 考虑分区表（按 file_uuid）

 ## 性能优化

@@ -343,7 +343,7 @@ pub async fn batch_insert_appearances(

 ```rust
 // 使用 Redis 缓存时间轴查询
-let cache_key = format!("person_timeline:{}:{}", video_uuid, person_id);
+let cache_key = format!("person_timeline:{}:{}", file_uuid, person_id);

 if let Some(cached) = redis.get(&cache_key).await? {
    return Ok(serde_json::from_str(&cached)?);
@@ -392,4 +392,4 @@ lazy_static! {
 - [InsightFace Documentation](https://github.com/deepinsight/insightface)
 - [WhisperX Speaker Diarization](https://github.com/m-bain/whisperX)
 - [PostgreSQL pgvector](https://github.com/pgvector/pgvector)
- [完整架构设计文档](./PERSON_IDENTITY_INTEGRATION.md)
+- [完整架构设计文档](./PERSON_IDENTITY_INTEGRATION.md)
@@ -112,11 +112,11 @@ CREATE TABLE assets (
 ```

 ### 2.2 核心流程
-1.  **上傳/偵測**: SFTPGo 觸發 Webhook 或用戶透過 API 上傳。
-2.  **探針分析**: `ffprobe` 提取解析度、幀率、音軌、編碼、時長。
-3.  **智能預處理**: 呼叫 `Smart Thumbnail` 處理器，跳過片頭黑屏，提取正片首幀。
-4.  **分類標記**: 根據探針結果自動標記類型（如 `duration > 300s` 標記為 `long_form`）。
-5.  **入隊**: 狀態轉為 `PENDING`，寫入 Redis 任務隊列 `queue:processing`。
+1. **上傳/偵測**: SFTPGo 觸發 Webhook 或用戶透過 API 上傳。
+2. **探針分析**: `ffprobe` 提取解析度、幀率、音軌、編碼、時長。
+3. **智能預處理**: 呼叫 `Smart Thumbnail` 處理器，跳過片頭黑屏，提取正片首幀。
+4. **分類標記**: 根據探針結果自動標記類型（如 `duration > 300s` 標記為 `long_form`）。
+5. **入隊**: 狀態轉為 `PENDING`，寫入 Redis 任務隊列 `queue:processing`。

 ---

@@ -173,10 +173,10 @@ LIMIT 1;
 | `chunks.json` | Pre-Chunk | `chunks` + `parent_chunks` | 語意搜尋、父子關聯檢索 |

 ### 4.3 向量索引建立
-1.  提取文本內容 (ASR + OCR + Chunk Summary)。
-2.  呼叫 `embedding_engine` 服務 (`nomic-embed-text-v2-moe`) 生成 768-dim 向量。
-3.  寫入 Qdrant Collection (`momentry_rule1`, `rule2`, `rule3`)。
-4.  狀態更新至 `READY`，觸發 Webhook 通知使用者。
+1. 提取文本內容 (ASR + OCR + Chunk Summary)。
+2. 呼叫 `embedding_engine` 服務 (`nomic-embed-text-v2-moe`) 生成 768-dim 向量。
+3. 寫入 Qdrant Collection (`momentry_rule1`, `rule2`, `rule3`)。
+4. 狀態更新至 `READY`，觸發 Webhook 通知使用者。

 ---

@@ -0,0 +1,392 @@
+# Pose-based Identity Matching 优化方案
+
+> 规划日期: 2026-04-28
+> 规划版本: V1.0
+> 基于实验: Pose-filtered Matching Test
+
+---
+
+## 优化目标
+
+### 核心目标
+
+| 目标 | 当前状态 | 目标状态 |
+|------|---------|---------|
+| **Match Ratio** | 45.16% (阈值 0.85) | **60%+** |
+| **Angle Coverage** | {three_quarter, profile_left, profile_right} | **{frontal, three_quarter, profile_left, profile_right}** |
+| **Angle-specific Similarity** | profile_right: 0.08 ❌ | **> 0.85** |
+| **自动化程度** | 手动选择参考向量 | **自动多角度注册** |
+
+---
+
+## 问题分析
+
+### 当前实验结果
+
+| Angle | Avg Similarity | Frames | Match Ratio | 问题 |
+|-------|----------------|--------|-------------|------|
+| **three_quarter** | 0.67 | 27 (87%) | 48% | 主要角度，覆盖良好 |
+| **profile_left** | 0.97 ✅ | 3 (10%) | 100% | 参考向量匹配度高 |
+| **profile_right** | 0.08 ❌ | 1 (3%) | 0% | **缺少参考向量** |
+| **frontal** | - | 0 | - | **未检测到** |
+
+### 问题根因
+
+| 问题 | 原因 | 解决方案 |
+|------|------|---------|
+| **profile_right 相似度低** | 缺少该角度参考向量 | 自动选择 profile_right 帧注册 |
+| **frontal 未检测到** | 视频中没有正面人脸 | 需要补充 frontal 参考向量 |
+| **角度分类粗糙** | 仅用 ratio threshold | 增加 landmarks geometry 分析 |
+| **手动选择参考向量** | 需人工干预 | 实现自动多角度选择 |
+
+---
+
+## 优化方案设计
+
+### Phase 1: 角度分类算法优化
+
+**目标**: 提高角度分类准确性
+
+**改进点**:
+- 当前: 仅用 `nose_to_eye / eye_width` ratio
+- 改进: 增加 landmarks geometry 特征
+
+**具体改进**:
+
+| 特征 | 当前 | 新增 |
+|------|------|------|
+| **Ratio** | ✅ | 保持 |
+| **Eye Slope** | ❌ | 眼睛连线斜率（判断仰视/俯视） |
+| **Nose Position** | ❌ | 鼻子相对眼睛中心的偏移 |
+| **Mouth Symmetry** | ❌ | 嘴角对称性（判断侧脸） |
+| **3D Landmarks** | ❌ | 使用 3D_68 landmarks（如有） |
+
+**实施任务**:
+1. 实现 `calculate_pose_angle_v2()` 函数
+2. 添加多特征综合评分
+3. 输出更精确的 angle 分类
+
+---
+
+### Phase 2: 自动多角度参考向量选择
+
+**目标**: 自动选择覆盖所有角度的参考向量
+
+**算法设计**:
+
+```
+输入: face.json (所有帧人脸)
+输出: 4-10 个高质量参考向量（覆盖所有角度）
+
+步骤:
+1. 计算每帧人脸的 pose angle
+2. 按 angle 分组
+3. 每组按 quality_score 排序
+4. 每组选择 Top 1-2 个
+5. 总数限制 10 个
+```
+
+**角度覆盖策略**:
+
+| Angle | 目标数量 | 选择策略 |
+|-------|---------|---------|
+| **frontal** | 1-2 | ratio < 0.4, quality > 0.85 |
+| **three_quarter** | 2-3 | ratio 0.4-0.6, quality > 0.80 |
+| **profile_left** | 1-2 | nose left of center, quality > 0.75 |
+| **profile_right** | 1-2 | nose right of center, quality > 0.75 |
+
+**实施任务**:
+1. 改进 `select_face_reference_vectors.py`
+2. 实现自动角度分组
+3. 确保最少 4 个角度覆盖
+4. 生成 angle_coverage_report
+
+---
+
+### Phase 3: Identity 注册优化
+
+**目标**: 注册时自动存储 pose angle
+
+**当前问题**: reference_data 中 angle 多为 "unknown"
+
+**改进**:
+- 计算 pose angle 并存储到 reference_data
+- 存储 pose_ratio 供后续过滤使用
+
+**reference_data 结构优化**:
+
+```json
+{
+  "face_embeddings": [
+    {
+      "embedding": [512-dim],
+      "angle": "three_quarter",
+      "pose_ratio": 0.542,
+      "eye_slope": 0.12,
+      "nose_offset": -5.3,
+      "quality_score": 0.92,
+      "source": "video_detection",
+      "frame": "210",
+      "created_at": "2026-04-28T..."
+    }
+  ],
+  "angle_coverage": {
+    "frontal": 2,
+    "three_quarter": 3,
+    "profile_left": 1,
+    "profile_right": 1
+  },
+  "best_angle": "three_quarter",
+  "total_references": 7
+}
+```
+
+**实施任务**:
+1. 更新 reference_data JSON schema
+2. 注册时计算 pose features
+3. 生成 angle_coverage 统计
+
+---
+
+### Phase 4: Pose-filtered Matching 优化
+
+**目标**: 改进匹配策略
+
+**当前问题**:
+- 找不到同角度向量时，fallback 不够智能
+- 阈值固定，未考虑角度差异
+
+**改进策略**:
+
+| 场景 | 当前策略 | 改进策略 |
+|------|---------|---------|
+| **有同角度向量** | 使用同角度 | 保持 ✅ |
+| **无同角度向量** | 使用 three_quarter | **使用 closest angle** |
+| **阈值固定** | 0.85 | **角度自适应阈值** |
+
+**角度自适应阈值**:
+
+| Angle | Threshold | 说明 |
+|-------|-----------|------|
+| **frontal** | 0.90 | 最高质量 |
+| **three_quarter** | 0.85 | 标准 |
+| **profile_left/right** | 0.80 | 更宽容（角度差异大） |
+
+**Closest Angle Fallback**:
+
+```python
+angle_similarity = {
+    'frontal': {'frontal': 1.0, 'three_quarter': 0.8, 'profile': 0.5},
+    'three_quarter': {'frontal': 0.8, 'three_quarter': 1.0, 'profile': 0.7},
+    'profile': {'frontal': 0.5, 'three_quarter': 0.7, 'profile': 1.0},
+}
+
+# Fallback order
+if detected_angle == 'profile_right':
+    fallback_order = ['profile_right', 'profile_left', 'three_quarter', 'frontal']
+```
+
+**实施任务**:
+1. 实现 `strategy_pose_filtered_v2()`
+2. 添加角度自适应阈值
+3. 实现 closest angle fallback
+4. 添加 angle_similarity 矩阵
+
+---
+
+### Phase 5: 生产流程整合
+
+**目标**: 整合到 Momentry Core 生产流程
+
+**整合点**:
+
+| 流程 | 整合内容 |
+|------|---------|
+| **Face Processor** | 输出 pose angle 到 face.json |
+| **Identity Registration API** | 自动多角度参考向量选择 |
+| **Identity Matching API** | Pose-filtered matching |
+| **Portal UI** | 显示 angle_coverage |
+
+**API 设计**:
+
+```
+POST /api/v1/identities/:id/register-reference-vectors
+Body: {
+  "file_uuid": "xxx",
+  "face_json_path": "output/xxx.face.json",
+  "auto_select": true,
+  "min_angles": 4,
+  "max_vectors": 10
+}
+
+Response: {
+  "uuid": "xxx",
+  "reference_count": 7,
+  "angle_coverage": {...},
+  "quality_avg": 0.89
+}
+```
+
+---
+
+## 实施计划
+
+### 阶段划分
+
+| Phase | 任务 | 优先级 | 预计时间 |
+|-------|------|--------|---------|
+| **Phase 1** | 角度分类算法优化 | 高 | 1天 |
+| **Phase 2** | 自动多角度参考向量选择 | 高 | 1天 |
+| **Phase 3** | Identity 注册优化 | 中 | 0.5天 |
+| **Phase 4** | Pose-filtered Matching 优化 | 中 | 1天 |
+| **Phase 5** | 生产流程整合 | 低 | 2天 |
+
+**总计**: 5.5天
+
+---
+
+### Phase 1 详细任务
+
+| 任务 | 说明 | 文件 |
+|------|------|------|
+| Task 1.1 | 实现 `calculate_pose_angle_v2()` | `scripts/utils/pose_analyzer.py` |
+| Task 1.2 | 添加多特征计算 | 同上 |
+| Task 1.3 | 单元测试 | `tests/test_pose_analyzer.py` |
+| Task 1.4 | 验证角度分类准确性 | 测试脚本 |
+
+**验证指标**:
+- Angle 分类准确率 > 90%
+- 特征计算速度 < 0.01s/face
+
+---
+
+### Phase 2 详细任务
+
+| 任务 | 说明 | 文件 |
+|------|------|------|
+| Task 2.1 | 实现角度分组算法 | `scripts/select_face_reference_vectors_v2.py` |
+| Task 2.2 | 实现每角度 Top-K 选择 | 同上 |
+| Task 2.3 | 确保最少角度覆盖 | 同上 |
+| Task 2.4 | 生成 angle_coverage_report | 同上 |
+| Task 2.5 | 批量测试（多个视频） | 测试脚本 |
+
+**验证指标**:
+- Angle 覆盖 ≥ 4
+- 参考向量数量 4-10
+- 质量 avg > 0.85
+
+---
+
+### Phase 3 详细任务
+
+| 任务 | 说明 | 文件 |
+|------|------|------|
+| Task 3.1 | 更新 reference_data schema | 设计文档 |
+| Task 3.2 | 注册脚本集成 pose features | `scripts/register_identity_with_pose.py` |
+| Task 3.3 | 数据库测试 | 测试脚本 |
+
+**验证指标**:
+- reference_data 包含 pose features ✅
+- angle_coverage 统计准确 ✅
+
+---
+
+### Phase 4 详细任务
+
+| 任务 | 说明 | 文件 |
+|------|------|------|
+| Task 4.1 | 实现 `strategy_pose_filtered_v2()` | `scripts/match_face_with_pose_v2.py` |
+| Task 4.2 | 实现角度自适应阈值 | 同上 |
+| Task 4.3 | 实现 closest angle fallback | 同上 |
+| Task 4.4 | 批量测试对比 | 测试脚本 |
+
+**验证指标**:
+- Match Ratio > 60% (阈值 0.85)
+- profile_right 相似度 > 0.85
+- Fallback 有效
+
+---
+
+### Phase 5 详细任务
+
+| 任务 | 说明 | 文件 |
+|------|------|------|
+| Task 5.1 | Face Processor 输出 pose angle | `scripts/face_processor.py` |
+| Task 5.2 | Identity Registration API | `src/api/identity.rs` |
+| Task 5.3 | Identity Matching API | 同上 |
+| Task 5.4 | Portal UI 组件 | Vue components |
+| Task 5.5 | 整合测试 | E2E 测试 |
+
+**验证指标**:
+- API 响应正常 ✅
+- UI 显示 angle_coverage ✅
+- E2E 流程成功 ✅
+
+---
+
+## 预期成果
+
+### 定量指标
+
+| 指标 | 当前 | Phase 4后 | Phase 5后 |
+|------|------|----------|----------|
+| **Match Ratio (阈值 0.85)** | 45.16% | **60%+** | 65%+ |
+| **Angle Coverage** | 2-3 | **4+** | 4+ |
+| **profile_right Similarity** | 0.08 | **0.85+** | 0.85+ |
+| **自动化程度** | 手动 | 半自动 | **全自动** |
+
+### 定性改进
+
+| 改进 | 说明 |
+|------|------|
+| **鲁棒性** | 多角度覆盖，减少角度差异影响 |
+| **准确性** | 角度分类更精确，匹配更可靠 |
+| **自动化** | 从手动选择到自动注册 |
+| **可追溯** | pose features 存储可追溯 |
+
+---
+
+## 验证方案
+
+### 单元测试
+
+| 测试 | 说明 |
+|------|------|
+| `test_pose_analyzer` | 角度分类准确性 |
+| `test_reference_selector_v2` | 多角度选择逻辑 |
+| `test_pose_filtered_matching_v2` | 匹配策略有效性 |
+
+### 集成测试
+
+| 测试 | 说明 |
+|------|------|
+| `test_identity_registration_with_pose` | 注册流程 |
+| `test_batch_matching` | 批量匹配效果 |
+| `test_angle_coverage` | 角度覆盖验证 |
+
+### E2E 测试
+
+| 测试 | 说明 |
+|------|------|
+| `test_full_pipeline` | 从 Face Processor 到 Matching |
+| `test_api_integration` | API 端到端 |
+
+---
+
+## 风险与缓解
+
+| 风险 | 影响 | 缓解措施 |
+|------|------|---------|
+| **缺少 frontal 帧** | frontal 角度无参考向量 | 使用 closest angle fallback |
+| **角度分类错误** | 匹配失败 | 多特征综合评分 |
+| **计算成本增加** | 性能下降 | 预计算 pose features |
+| **阈值设置不当** | 匹配率波动 | 角度自适应阈值 |
+
+---
+
+## 版本信息
+
+- 规划版本: V1.0
+- 规划日期: 2026-04-28
+- 规划状态: ✅ 完成
+- 下一步: **Phase 1 实施**
@@ -2,8 +2,8 @@
 document_type: "architecture_design"
 service: "MOMENTRY_CORE"
 title: "Video Processing Pipeline - 處理流程"
-date: "2026-03-22"
-version: "V1.0"
+date: "2026-04-27"
+version: "V1.2"
 status: "active"
 owner: "Warren"
 created_by: "OpenCode"
@@ -12,10 +12,12 @@ tags:
  - "video"
  - "pipeline"
  - "處理流程"
+  - "processing_status"
 ai_query_hints:
  - "查詢 Video Processing Pipeline - 處理流程 的內容"
  - "Video Processing Pipeline - 處理流程 的主要目的是什麼？"
  - "如何操作或實施 Video Processing Pipeline - 處理流程？"
+  - "processing_status 字段與 status 的關係"
 ---

 # Video Processing Pipeline - 處理流程
@@ -24,7 +26,7 @@ ai_query_hints:
 |------|------|
 | 建立者 | Warren |
 | 建立時間 | 2026-03-22 |
-| 文件版本 | V1.1 |
+| 文件版本 | V1.2 |

 ---

@@ -34,6 +36,7 @@ ai_query_hints:
 |------|------|------|--------|-----------|
 | V1.0 | 2026-03-22 | 創建文件 | Warren | OpenCode |
 | V1.1 | 2026-03-26 | 更新流程圖文字 (media_url→file_path) | OpenCode | deepseek-reasoner |
+| V1.2 | 2026-04-27 | 添加 processing_status 字段說明 | OpenCode | GLM-5 |

 ---

@@ -265,9 +268,16 @@ let query_vector = embedder.embed_query("搜索查詢").await?;
 ### PostgreSQL 狀態欄位

 ```sql
-- 影片處理狀態
+-- 影片處理狀態（基本狀態）
 videos.status: 'pending' | 'processing' | 'completed' | 'failed'

+-- 影片處理狀態（詳細狀態）
+videos.processing_status: 'REGISTERED' | 'PENDING' | 'PROBING' | 'ASR' | 'OCR' | 'YOLO' | 'FACE' | 'POSE' | 'CUT' | 'ASRX' | 'COMPLETED' | 'FAILED' | 'PAUSED' | 'RESUMING'
+
+-- 說明：
+-- status：基本狀態，用於 API 查詢過濾（is_processed=true → status='completed'）
+-- processing_status：詳細狀態，用於 Portal 顯示和作業追蹤
+
 -- 檔案處理狀態
 videos.fs_json: true/false
 videos.fs_chunks: true/false
@@ -307,6 +317,46 @@ curl http://localhost:3002/api/v1/progress/{uuid}
 }
 ```

+### Agent 進度追蹤（V1.2 起）
+
+從 V1.2 起，Agent 任務透過 `processing_status` JSONB 的 `agents` 字段追蹤。
+
+#### Agent 進度字段
+
+| Agent | JSONB 路徑 | 說明 |
+|-------|-----------|------|
+| 5W1H | `processing_status->agents->5w1h` | 場景摘要 Agent |
+| Translation | `processing_status->agents->translation` | 翻譯 Agent |
+
+#### Agent 狀態結構
+
+```json
+{
+  "agents": {
+    "5w1h": {
+      "status": "running",
+      "scenes_processed": 5,
+      "scenes_total": 1332,
+      "progress_pct": 0.4,
+      "started_at": "2026-04-27T05:45:00Z"
+    }
+  }
+}
+```
+
+#### SQL 查詢 Agent 進度
+
+```sql
+SELECT 
+  uuid,
+  processing_status->'agents'->'5w1h'->>'status' as status,
+  processing_status->'agents'->'5w1h'->>'scenes_processed' as processed
+FROM videos 
+WHERE processing_status->'agents'->'5w1h'->>'status' = 'running';
+```
+
+詳細規範請參考: `REFERENCE/PROCESSING_STATUS_JSONB_SPEC.md`
+
 ---

 ## 下一步
@@ -162,4 +162,4 @@ A: 調整 Qdrant 向量索引參數，優化嵌入模型，添加緩存層。

 **更新時間**: 2026-04-22  
 **適用對象**: 新團隊成員、開發者、架構師  
-**建議閱讀時間**: 5 分鐘
+**建議閱讀時間**: 5 分鐘
@@ -64,7 +64,7 @@ ai_query_hints:

 ### 2.2 開發標準

-#### Python 處理器標準：
+#### Python 處理器標準
 ```python
 # 1. 必要的導入
 import json
@@ -79,7 +79,7 @@ parser.add_argument("--output", required=True, help="Output path")
 args = parser.parse_args()

 # 3. 主處理邏輯
-def process_video(video_uuid, output_path):
+def process_video(file_uuid, output_path):
    # 處理邏輯
    result = {
        "status": "success",
@@ -107,31 +107,31 @@ if __name__ == "__main__":

 ### 3.1 測試類型

-#### 單元測試：
+#### 單元測試
 - 測試處理器核心邏輯
 - 驗證輸入輸出格式
 - 測試錯誤處理

-#### 集成測試：
+#### 集成測試
 - 測試與其他組件的集成
 - 驗證數據流完整
 - 測試性能表現

-#### 回歸測試：
+#### 回歸測試
 - 確保新版本不破壞現有功能
 - 測試兼容性
 - 驗證性能改進

 ### 3.2 測試數據

-#### 測試視頻：
+#### 測試視頻
 | 類型 | 用途 | 示例 |
 |------|------|------|
 | 短視頻（<1分鐘） | 快速測試 | test_video.mp4 |
 | 中等視頻（1-5分鐘） | 功能測試 | demo_video.mp4 |
 | 長視頻（>10分鐘） | 性能測試 | long_video.mp4 |

-#### 測試環境：
+#### 測試環境
 1. **本地開發環境**：快速迭代
 2. **測試服務器**：集成測試
 3. **生產模擬環境**：性能測試
@@ -187,25 +187,25 @@ INSERT INTO processors (

 ### 5.1 調度與執行

-#### 任務調度流程：
+#### 任務調度流程
 ```
 1. 任務創建 → 2. 處理器選擇 → 3. 資源分配
   → 4. 執行監控 → 5. 結果收集 → 6. 狀態更新
 ```

-#### 執行監控：
+#### 執行監控
 1. **進程監控**：監控處理器進程狀態
 2. **資源監控**：監控 CPU、內存、GPU 使用
 3. **性能監控**：監控處理速度和進度

 ### 5.2 錯誤處理與恢復

-#### 錯誤類型：
+#### 錯誤類型
 1. **可恢復錯誤**：臨時性問題，可重試
 2. **配置錯誤**：配置問題，需要修復
 3. **系統錯誤**：系統級問題，需要干預

-#### 重試策略：
+#### 重試策略
 ```rust
 // Rust 中的重試機制示例
 let result = run_with_retry(
@@ -221,7 +221,7 @@ let result = run_with_retry(

 ### 5.3 性能優化

-#### 優化策略：
+#### 優化策略
 1. **並行處理**：同時處理多個視頻
 2. **批處理**：批量處理相關任務
 3. **緩存優化**：重用計算結果
@@ -233,13 +233,13 @@ let result = run_with_retry(

 ### 6.1 日常維護

-#### 監控項目：
+#### 監控項目
 1. **處理器狀態**：運行狀態、健康狀態
 2. **性能指標**：處理速度、成功率
 3. **資源使用**：CPU、內存、存儲
 4. **錯誤率**：各種錯誤的發生頻率

-#### 維護任務：
+#### 維護任務
 1. **日誌分析**：定期分析處理器日誌
 2. **性能調優**：根據監控數據進行調優
 3. **安全更新**：更新依賴庫修復安全漏洞
@@ -247,13 +247,13 @@ let result = run_with_retry(

 ### 6.2 版本升級

-#### 升級流程：
+#### 升級流程
 1. **兼容性檢查**：檢查新版本與現有系統的兼容性
 2. **回滾計劃**：制定升級失敗時的回滾計劃
 3. **分階段部署**：分階段逐步升級
 4. **驗證測試**：升級後進行全面測試

-#### 版本兼容性矩陣：
+#### 版本兼容性矩陣
 | 處理器版本 | 系統版本 | 模型版本 | 狀態 |
 |------------|----------|----------|------|
 | v1.0.x | v0.1.0 | insightface==0.7.3 | ✅ 兼容 |
@@ -361,4 +361,4 @@ let result = run_with_retry(
 |------|------|----------|--------|
 | V1.0 | 2026-04-22 | 創建處理器生命週期管理文檔 | OpenCode |

-**最後更新日期**: 2026-04-22
+**最後更新日期**: 2026-04-22
@@ -305,10 +305,10 @@ match processor.execution_type.as_str() {
 處理器在執行時，需要查詢「服務註冊中心」來獲取依賴資源的配置。

 **流程範例**:
-1.  排程器啟動 `asr_processor.py`。
-2.  Python 腳本查詢本地配置檔 (由排程器生成，內容來自 `services` 表)。
-3.  腳本獲取 Ollama 的 `endpoint` 與 `model_name`。
-4.  腳本執行 Embedding 任務。
+1. 排程器啟動 `asr_processor.py`。
+2. Python 腳本查詢本地配置檔 (由排程器生成，內容來自 `services` 表)。
+3. 腳本獲取 Ollama 的 `endpoint` 與 `model_name`。
+4. 腳本執行 Embedding 任務。

 這樣實現了**處理器與基礎設施配置的解耦**。

@@ -22,8 +22,8 @@
 所有 Processor (YOLO, ASR...) 和 Agent (Translation, Summary...) 啟動時應主動註冊。

 ### 1.1 註冊時機
-*   **Processor**: 在 Python 腳本啟動時，呼叫 HTTP Endpoint 註冊。
-*   **Agent**: 在服務啟動時呼叫 HTTP Endpoint 註冊。
+* **Processor**: 在 Python 腳本啟動時，呼叫 HTTP Endpoint 註冊。
+* **Agent**: 在服務啟動時呼叫 HTTP Endpoint 註冊。

 ---

@@ -48,7 +48,7 @@
 }
 ```

-*   **resource_id**: 建議格式 `{type}_{name}_{uuid}`，例如 `processor_yolo_a1b2c3`。
+* **resource_id**: 建議格式 `{type}_{name}_{uuid}`，例如 `processor_yolo_a1b2c3`。

 ### 2.3 Response

@@ -74,23 +74,23 @@
 ```json
 {
  "status": "idle | busy | error",
-  "job_uuid": "current_video_uuid", 
+  "job_uuid": "current_file_uuid", 
  "progress": 0.45, 
  "last_frame_index": 12500
 }
 ```

-*   **progress**: 0.0 到 1.0 之間的浮點數。
-*   **job_uuid**: 當前正在處理的任務 ID。
+* **progress**: 0.0 到 1.0 之間的浮點數。
+* **job_uuid**: 當前正在處理的任務 ID。

 ---

 ## 4. 監控用途

 系統後台 (Portal Dashboard) 可透過查詢 Registry 實現：
-1.  **即時儀表板**: 顯示目前有幾個 Processor 在運行 (`busy` 數量)。
-2.  **進度條**: 透過 `last_frame_index` 與影片總幀數計算百分比。
-3.  **健康檢查**: 若資源超過 60 秒未發送心跳，標記為 `offline`。
+1. **即時儀表板**: 顯示目前有幾個 Processor 在運行 (`busy` 數量)。
+2. **進度條**: 透過 `last_frame_index` 與影片總幀數計算百分比。
+3. **健康檢查**: 若資源超過 60 秒未發送心跳，標記為 `offline`。

 ---

@@ -116,5 +116,5 @@ deregister_resource(&resource_id).await;

 ## 版本資訊

- 版本: V1.0
- 建立日期: 2026-04-25
+* 版本: V1.0
+* 建立日期: 2026-04-25
@@ -67,18 +67,18 @@

 ## 3. 資源生命週期 (Resource Lifecycle)

-1.  **註冊 (Registration)**:
-    *   組件啟動時向 **Resource Registry** 報到，聲明其 ID、類型和能力。
-    *   *範例*: Agent 啟動，註冊 `resource_type: "agent"`, `capabilities: ["summarize_text"]`。
-2.  **發現 (Discovery)**:
-    *   調度器 (Scheduler) 根據任務需求查詢 Registry 尋找合適的資源。
-    *   *範例*: 任務需要「語音轉文字」，查詢 `capabilities: ["audio_to_text"]`。
-3.  **分配與執行 (Allocation & Execution)**:
-    *   狀態變為 `busy`，接收任務並執行。
-4.  **健康檢查 (Health Monitoring)**:
-    *   Registry 定期 Ping 資源。若無回應，標記為 `offline`。
-5.  **登出 (Deregistration)**:
-    *   組件關閉或崩潰時從 Registry 移除。
+1. **註冊 (Registration)**:
+    * 組件啟動時向 **Resource Registry** 報到，聲明其 ID、類型和能力。
+    * *範例*: Agent 啟動，註冊 `resource_type: "agent"`, `capabilities: ["summarize_text"]`。
+2. **發現 (Discovery)**:
+    * 調度器 (Scheduler) 根據任務需求查詢 Registry 尋找合適的資源。
+    * *範例*: 任務需要「語音轉文字」，查詢 `capabilities: ["audio_to_text"]`。
+3. **分配與執行 (Allocation & Execution)**:
+    * 狀態變為 `busy`，接收任務並執行。
+4. **健康檢查 (Health Monitoring)**:
+    * Registry 定期 Ping 資源。若無回應，標記為 `offline`。
+5. **登出 (Deregistration)**:
+    * 組件關閉或崩潰時從 Registry 移除。

 ---

@@ -127,36 +127,36 @@ CREATE INDEX idx_res_caps ON resources USING GIN(capabilities);
 ## 5. 實作建議

 ### 5.1 Processor 實作 (確定性)
-*   通常由 Python 腳本或 Rust 二進制執行。
-*   啟動時呼叫 `POST /resources/register`，宣告如 `["video_to_frames", "detect_objects"]`。
+* 通常由 Python 腳本或 Rust 二進制執行。
+* 啟動時呼叫 `POST /resources/register`，宣告如 `["video_to_frames", "detect_objects"]`。

 ### 5.2 Agent 實作 (機率性)
-*   通常封裝為具備 LLM Context 的服務。
-*   啟動時呼叫 `POST /resources/register`，宣告如 `["summarize_text", "extract_5w1h"]`。
-*   **重點**: 在 `metadata` 中記錄使用的 LLM 模型名稱 (e.g., `gpt-4o`, `llama3`)。
+* 通常封裝為具備 LLM Context 的服務。
+* 啟動時呼叫 `POST /resources/register`，宣告如 `["summarize_text", "extract_5w1h"]`。
+* **重點**: 在 `metadata` 中記錄使用的 LLM 模型名稱 (e.g., `gpt-4o`, `llama3`)。

 ### 5.3 Service 實作 (基礎設施)
-*   通常由 Docker Compose 或 Systemd 管理。
-*   可透過 Sidecar 或定期腳本進行註冊與心跳更新。
+* 通常由 Docker Compose 或 Systemd 管理。
+* 可透過 Sidecar 或定期腳本進行註冊與心跳更新。

 ---

 ## 6. 與其他架構的關係

-*   **Job/Task Scheduler**: 任務調度器依賴 Resource Registry 來尋找誰能執行任務。
-*   **Configuration Management**: 資源的詳細參數 (如 API Key, Threshold) 應存在 Config 中心，Registry 僅儲存引用或摘要。
-*   **Monitoring**: Prometheus/Grafana 應抓取 Registry 狀態來展示系統資源健康度儀表板。
+* **Job/Task Scheduler**: 任務調度器依賴 Resource Registry 來尋找誰能執行任務。
+* **Configuration Management**: 資源的詳細參數 (如 API Key, Threshold) 應存在 Config 中心，Registry 僅儲存引用或摘要。
+* **Monitoring**: Prometheus/Grafana 應抓取 Registry 狀態來展示系統資源健康度儀表板。

 ## 7. 關聯文檔

 本目錄整合了原有的 Processor 與 Service 架構，並納入新的 Agent 架構：
- `PROCESSOR_REGISTRY_ARCHITECTURE.md` - 舊版處理器註冊設計 (已整合)。
- `SERVICE_REGISTRY_ARCHITECTURE.md` - 舊版服務註冊設計 (已整合)。
- `PROCESSOR_LIFECYCLE.md` - 處理器生命週期 (資源生命週期的子集)。
+* `PROCESSOR_REGISTRY_ARCHITECTURE.md` - 舊版處理器註冊設計 (已整合)。
+* `SERVICE_REGISTRY_ARCHITECTURE.md` - 舊版服務註冊設計 (已整合)。
+* `PROCESSOR_LIFECYCLE.md` - 處理器生命週期 (資源生命週期的子集)。

 ---

 ## 版本資訊

- 版本: V1.0
- 建立日期: 2026-04-25
+* 版本: V1.0
+* 建立日期: 2026-04-25
@@ -134,7 +134,7 @@ const job = await response.json();

 // 狀態檢查
 if (job.status === 'completed') {
-  return [{ json: { done: true, video_uuid: job.video_uuid } }];
+  return [{ json: { done: true, file_uuid: job.file_uuid } }];
 } else {
  return [{ json: { done: false, status: job.status } }];
 }
@@ -385,13 +385,13 @@ add_shortcode('momentry_search', function($atts) {
        $html .= '<ul>';

        foreach ($results['results'] as $result) {
-            $video_uuid = $result['uuid'];
+            $file_uuid = $result['uuid'];
            $start = $result['start_time'] ?? 0;
            $end = $result['end_time'] ?? 0;
            $text = $result['text'] ?? '無文字描述';
            
            $html .= '<li>';
-            $html .= '<a href="/player?uuid=' . esc_attr($video_uuid) . 
+            $html .= '<a href="/player?uuid=' . esc_attr($file_uuid) . 
                     '&start=' . esc_attr($start) . 
                     '&end=' . esc_attr($end) . '">';
            $html .= '播放 ' . $start . 's - ' . $end . 's';
@@ -162,4 +162,4 @@ Momentry Core 的安全架構設計遵循業界最佳實踐，包括：
 3. **持續監控**：實時監控安全事件，快速響應
 4. **合規要求**：符合 GDPR、CCPA 等隱私法規

-通過上述安全措施，確保系統在提供強大功能的同時，保持高度的安全性與合規性。
+通過上述安全措施，確保系統在提供強大功能的同時，保持高度的安全性與合規性。
@@ -11,9 +11,9 @@
 本設計文檔旨在定義 Momentry Core 的**多維度自然語言搜尋 (Multi-Dimensional Semantic Search)** 系統架構與實施規範。該系統旨在突破傳統關鍵詞匹配的限制，通過解析用戶的「人事時地物」(5W1H) 意圖，結合多模態數據 (ASR, YOLO, Pose, Scene, Face)，實現高精度的語義檢索。

 ### 1.1 設計原則
-1.  **模組化 (Modularity)**: 搜尋功能作為獨立的 `Search Processor` 模塊，依賴但不侵入其他數據生產模塊 (如 Pose, ASR)。
-2.  **多模態融合 (Multi-Modal Fusion)**: 結合結構化數據 (SQL 過濾) 與非結構化向量數據 (Vector 檢索)。
-3.  **本地優先 (Local First)**: 核心解析與檢索邏輯盡可能在本地完成，僅 LLM 意圖解析可調用雲端或本地 LLM。
+1. **模組化 (Modularity)**: 搜尋功能作為獨立的 `Search Processor` 模塊，依賴但不侵入其他數據生產模塊 (如 Pose, ASR)。
+2. **多模態融合 (Multi-Modal Fusion)**: 結合結構化數據 (SQL 過濾) 與非結構化向量數據 (Vector 檢索)。
+3. **本地優先 (Local First)**: 核心解析與檢索邏輯盡可能在本地完成，僅 LLM 意圖解析可調用雲端或本地 LLM。

 ---

@@ -40,26 +40,26 @@
 ### 2.2 事 (Event / Action / What)
 基於 `ASR` (語音語義) 和 `Pose Analyzer` (行為語義)。

-*   **語音內容**: "他在解釋量子力學" -> 向量檢索 ASR 文本。
-*   **視覺行為**: "他在跑步", "兩人在擁抱" -> 檢索 `pose_analysis` 標籤或向量。
+* **語音內容**: "他在解釋量子力學" -> 向量檢索 ASR 文本。
+* **視覺行為**: "他在跑步", "兩人在擁抱" -> 檢索 `pose_analysis` 標籤或向量。

 ### 2.3 時 (Time / When)
 基於 `chunks` 的時間戳。

-*   **絕對時間**: `10:05 - 10:15`。
-*   **相對時間**: "最後 5 分鐘", "剛開始"。
+* **絕對時間**: `10:05 - 10:15`。
+* **相對時間**: "最後 5 分鐘", "剛開始"。

 ### 2.4 地 (Location / Where)
 基於 `Scene` (Places365) 分類結果。

-*   **標籤**: "beach", "office", "living_room"。
-*   **映射**: 用戶說 "戶外" -> 映射為 `["beach", "forest", "street", ...]`。
+* **標籤**: "beach", "office", "living_room"。
+* **映射**: 用戶說 "戶外" -> 映射為 `["beach", "forest", "street", ...]`。

 ### 2.5 物 (Object / Which)
 基於 `YOLO` (物件檢測) 和 `OCR` (文字識別)。

-*   **物件**: `car`, `dog`, `knife`。
-*   **文字**: 路牌、標題中的關鍵詞。
+* **物件**: `car`, `dog`, `knife`。
+* **文字**: 路牌、標題中的關鍵詞。

 ---

@@ -96,8 +96,8 @@ graph TD
 ```

 ### 3.2 模組職責
-1.  **Pose Analyzer Processor**: 負責讀取 Pose 座標與 YOLO 數據，生成行為標籤 (Tags)，寫入數據庫。
-2.  **Search Processor**: 負責將自然語言轉為查詢語句並執行檢索。
+1. **Pose Analyzer Processor**: 負責讀取 Pose 座標與 YOLO 數據，生成行為標籤 (Tags)，寫入數據庫。
+2. **Search Processor**: 負責將自然語言轉為查詢語句並執行檢索。

 ---

@@ -188,12 +188,12 @@ WHERE

 ### 6.2 語義檢索 (Vector)
 針對模糊描述 (What) 使用向量相似度。
-*   將 "shouting at someone" 編碼為向量。
-*   在 Qdrant 中檢索與此向量相似的 `chunks` (基於 ASR 語義) 或 `pose_events` (基於動作語義)。
+* 將 "shouting at someone" 編碼為向量。
+* 在 Qdrant 中檢索與此向量相似的 `chunks` (基於 ASR 語義) 或 `pose_events` (基於動作語義)。

 ### 6.3 結果融合 (Re-ranking)
-*   取 SQL 過濾結果與 Vector 檢索結果的交集。
-*   若無交集，優先展示滿足 Filter (Who/Where) 的結果，按 Vector 分數排序。
+* 取 SQL 過濾結果與 Vector 檢索結果的交集。
+* 若無交集，優先展示滿足 Filter (Who/Where) 的結果，按 Vector 分數排序。

 ---

@@ -202,39 +202,39 @@ WHERE
 這是支持「事 (Event)」和「人 (Person Action)」維度的核心前置模塊。

 ### 7.1 處理流程
-1.  **輸入**: 原始 `pose.json` (座標) + `yolo.json` (物體框)。
-2.  **特徵工程**:
-    *   計算關節角度 (Angle): 手肘、膝蓋。
-    *   計算速度 (Velocity): 手腕、身體中心點位移。
-    *   計算交互 (Interaction): 人手框與 YOLO 物體框 IoU。
-3.  **規則分類 (Rule-based)**:
-    *   手部高於頭頂 -> `hands_up`。
-    *   雙手交叉於胸前 -> `arms_crossed`。
-    *   快速靠近另一人 -> `approaching`。
-4.  **輸出**: 更新 `chunks` 表的 `action_tags` 和 `person_identities` 表的 `attributes`。
+1. **輸入**: 原始 `pose.json` (座標) + `yolo.json` (物體框)。
+2. **特徵工程**:
+    * 計算關節角度 (Angle): 手肘、膝蓋。
+    * 計算速度 (Velocity): 手腕、身體中心點位移。
+    * 計算交互 (Interaction): 人手框與 YOLO 物體框 IoU。
+3. **規則分類 (Rule-based)**:
+    * 手部高於頭頂 -> `hands_up`。
+    * 雙手交叉於胸前 -> `arms_crossed`。
+    * 快速靠近另一人 -> `approaching`。
+4. **輸出**: 更新 `chunks` 表的 `action_tags` 和 `person_identities` 表的 `attributes`。

 ---

 ## 8. 實施路線圖

 ### Phase 1: 基礎設施 (Day 1-2)
-*   [ ] 更新數據庫 Schema (增加 `attributes`, `action_tags` 等字段與索引)。
-*   [ ] 創建 `scripts/pose_analyzer_processor.py` (基礎規則版：站/坐/臥/手勢)。
-*   [ ] 運行 Pose Analyzer 對現有數據進行標記。
+* [ ] 更新數據庫 Schema (增加 `attributes`, `action_tags` 等字段與索引)。
+* [ ] 創建 `scripts/pose_analyzer_processor.py` (基礎規則版：站/坐/臥/手勢)。
+* [ ] 運行 Pose Analyzer 對現有數據進行標記。

 ### Phase 2: 搜尋解析器 (Day 3-4)
-*   [ ] 創建 `scripts/search_processor.py`。
-*   [ ] 實現 LLM Intent Parser (Qwen3.6-plus)。
-*   [ ] 實現 Query Translator (生成動態 SQL)。
+* [ ] 創建 `scripts/search_processor.py`。
+* [ ] 實現 LLM Intent Parser (Qwen3.6-plus)。
+* [ ] 實現 Query Translator (生成動態 SQL)。

 ### Phase 3: 執行與整合 (Day 5-6)
-*   [ ] 實現 Search Executor (PostgreSQL 查詢邏輯)。
-*   [ ] 開發 `POST /api/v1/search/smart` API。
-*   [ ] 前端對接與測試。
+* [ ] 實現 Search Executor (PostgreSQL 查詢邏輯)。
+* [ ] 開發 `POST /api/v1/search/smart` API。
+* [ ] 前端對接與測試。

 ### Phase 4: 優化 (Day 7+)
-*   [ ] 引入向量檢索 (Vector Search) 支持模糊語義。
-*   [ ] 優化 Pose 分析算法 (引入 ST-GCN 等輕量模型)。
+* [ ] 引入向量檢索 (Vector Search) 支持模糊語義。
+* [ ] 優化 Pose 分析算法 (引入 ST-GCN 等輕量模型)。

 ---

@@ -0,0 +1,408 @@
+---
+document_type: "extension_design"
+title: "声音识别扩展设计 (Phase 5+)"
+service: "MOMENTRY_CORE"
+date: "2026-04-28"
+status: "planning"
+current_state: "draft"
+owner: "Warren"
+created_by: "OpenCode"
+created_at: "2026-04-28"
+version: "V1.0"
+tags:
+  - "sound_recognition"
+  - "audio_embedding"
+  - "animal_sound"
+  - "environmental_sound"
+  - "weapon_sound"
+  - "musical_instrument"
+  - "phase_5"
+related_documents:
+  - "IDENTITY_REFERENCE_VECTOR_DESIGN.md"
+  - "MOMENTRY_CORE_ARCHITECTURE_V2.md"
+ai_query_hints:
+  - "查詢声音识别扩展设计"
+  - "查詢動物叫聲 embedding"
+  - "查詢雷雨聲 embedding"
+  - "查詢槍炮聲 embedding"
+  - "查詢樂器聲 embedding"
+---
+
+# 声音识别扩展设计 (Phase 5+)
+
+| 項目 | 內容 |
+|------|------|
+| 建立者 | OpenCode |
+| 建立時間 | 2026-04-28 |
+| 文件版本 | V1.0 |
+| 状态 | Phase 5+ 待辦事項 |
+
+---
+
+## 版本歷史
+
+| 版本 | 日期 | 目的 | 操作人 | 工具/模型 |
+|------|------|------|--------|-----------|
+| V1.0 | 2026-04-28 | 創建声音识别扩展设计（Phase 5+） | OpenCode | OpenCode |
+
+---
+
+## 概述
+
+本文檔定義 Momentry Core Identity 系統的 **声音识别扩展设计**，屬於 **Phase 5+ 待辦事項**。
+
+核心理念：**将声音作为 Identity 进行识别和注册，支持动物叫聲、雷雨聲、槍炮聲、樂器聲等。**
+
+---
+
+## 设计目标
+
+### 核心目标
+
+| 目標 | 說明 |
+|------|------|
+| **声音 Identity** | 将声音作为 Identity 进行注册和管理 |
+| **声音 Embedding** | 提取声音的 embedding vector |
+| **声音匹配** | 在音频中识别特定声音的出现 |
+| **1对多参考向量** | 同一声音可存储多个 embedding（不同样本、不同质量） |
+| **声音分类** | 支持多種声音类型（动物、环境、武器、樂器） |
+
+### 适用场景
+
+| 场景 | 说明 |
+|------|------|
+| **电影/视频分析** | 识别电影中的枪声、雷声、狗叫声等 |
+| **环境监控** | 监控特定环境声音（雷雨、警报等） |
+| **音频搜索** | 搜索包含特定声音的音频片段 |
+| **声音数据库** | 建立声音 Identity 数据库（动物叫声库、乐器声音库） |
+
+---
+
+## 声音类型分类
+
+### identity_type 扩展
+
+```sql
+-- identities 表 identity_type 字段扩展
+identity_type VARCHAR(30) -- 新增类型: sound, animal, environmental
+```
+
+### 声音类型定义
+
+| identity_type | 说明 | 子类型 | 示例 |
+|---------------|------|--------|------|
+| **sound** | 通用声音 | TBD | 各种声音 |
+| **animal** | 动物叫声 | animal_dog_bark, animal_cat_meow, animal_bird_chirp | 狗叫声、猫叫声、鸟叫声 |
+| **environmental** | 环境音 | environmental_thunder, environmental_rain, environmental_wind | 雷声、雨声、风声 |
+| **weapon** | 武器声 | weapon_gunshot, weapon_explosion, weapon_siren | 枪声、爆炸声、警报声 |
+| **musical** | 乐器声 | musical_guitar, musical_piano, musical_drums | 吉他声、钢琴声、鼓声 |
+
+---
+
+## reference_data JSONB 结构
+
+### sound_embeddings 结构
+
+```json
+{
+  "sound_embeddings": [
+    {
+      "embedding": [0.1, 0.2, ...],          // TBD (声音 embedding 维度)
+      "source": "audio_segment",
+      "file_uuid": "vid_001",
+      "timestamp_start": 10.0,
+      "timestamp_end": 15.0,
+      "sound_type": "animal_dog_bark",
+      "quality_score": 0.95,
+      "sample_rate": 44100,
+      "duration": 5.0,
+      "created_at": "2026-04-28T13:00:00Z"
+    },
+    {
+      "embedding": [0.3, 0.4, ...],
+      "source": "audio_segment",
+      "file_uuid": "vid_002",
+      "timestamp_start": 20.0,
+      "timestamp_end": 25.0,
+      "sound_type": "animal_dog_bark",
+      "quality_score": 0.88,
+      "sample_rate": 44100,
+      "duration": 5.0,
+      "created_at": "2026-04-28T14:00:00Z"
+    }
+  ],
+  "audio_urls": [
+    "https://cdn.xxx.com/sounds/dog_bark_001.wav",
+    "https://cdn.xxx.com/sounds/dog_bark_002.wav"
+  ]
+}
+```
+
+### 字段说明
+
+| 字段 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| embedding | Array[TBD] | Yes | 声音 embedding vector（维度 TBD） |
+| source | String | Yes | 来源: audio_segment, audio_file, manual_upload |
+| file_uuid | String | Yes | 档案 UUID |
+| timestamp_start | Float | Yes | 开始时间（秒） |
+| timestamp_end | Float | Yes | 结束时间（秒） |
+| sound_type | String | Yes | 声音类型（见上表） |
+| quality_score | Float | No | 质量评分（0.0-1.0） |
+| sample_rate | Integer | No | 音频采样率 |
+| duration | Float | No | 音频时长（秒） |
+| created_at | String | Yes | 建立时间（ISO 8601） |
+
+---
+
+## 声音 Embedding 模型选择
+
+### 待评估模型
+
+| 模型 | 维度 | 说明 | 适用场景 |
+|------|------|------|----------|
+| **PANNs** | TBD | AudioSet 预训练模型 | 通用声音识别 |
+| **YAMNet** | 1024-dim | TensorFlow 音频分类模型 | 通用声音分类 |
+| **VGGish** | 128-dim | YouTube-8M 音频模型 | 音频特征提取 |
+| **Audio Spectrogram Transformer** | TBD | 基于 Transformer 的音频模型 | 音频理解 |
+| **CLAP** | 512-dim | Contrastive Language-Audio Pretraining | 文本-音频匹配 |
+
+### 模型评估指标
+
+| 指标 | 说明 |
+|------|------|
+| **Embedding 维度** | 维度大小影响存储和计算效率 |
+| **识别准确率** | 声音识别准确率 |
+| **提取速度** | Embedding 提取速度 |
+| **模型大小** | 模型文件大小 |
+| **GPU 支持** | 是否支持 MPS/CUDA |
+
+---
+
+## 声音 Identity 注册流程
+
+### 示例: 注册狗叫声 Identity
+
+```python
+def register_animal_sound_identity(sound_name, sound_type, audio_files):
+    """
+    声音 Identity 注册流程:
+    1. 提取多个音频样本的 embedding
+    2. 存储到 reference_data JSONB
+    3. 注册到 identities 表
+    """
+    
+    # Step 1: 提取 embedding
+    sound_embeddings = []
+    for audio_file in audio_files:
+        # 加载音频
+        audio_data = load_audio(audio_file)
+        
+        # 提取 embedding
+        embedding = audio_model.extract_embedding(audio_data)
+        
+        # 评估质量
+        quality_score = evaluate_audio_quality(audio_data)
+        
+        # 存储到 reference_data
+        sound_embeddings.append({
+            "embedding": embedding.tolist(),
+            "source": "audio_file",
+            "sound_type": sound_type,
+            "quality_score": quality_score,
+            "sample_rate": audio_data["sample_rate"],
+            "duration": audio_data["duration"],
+            "created_at": datetime.now().isoformat()
+        })
+    
+    # Step 2: 注册 Identity
+    identity = {
+        "identity_id": generate_uuid(),
+        "name": sound_name,
+        "identity_type": "animal",
+        "source": "manual",
+        "reference_data": {
+            "sound_embeddings": sound_embeddings,
+            "audio_urls": [audio_file.url for audio_file in audio_files]
+        }
+    }
+    
+    # Step 3: 计算 centroid
+    centroid = calculate_centroid([e["embedding"] for e in sound_embeddings])
+    identity["sound_embedding"] = centroid
+    
+    # 存储到資料庫
+    db.insert_identity(identity)
+    
+    return identity
+```
+
+---
+
+## 声音匹配流程
+
+### 示例: 在视频中识别狗叫声
+
+```python
+def detect_animal_sound(file_uuid, sound_identity, threshold=0.85):
+    """
+    声音匹配流程:
+    1. 提取视频音频段落的 embedding
+    2. 与 Identity 的 sound_embeddings 进行匹配
+    3. 返回匹配结果
+    """
+    
+    # Step 1: 提取视频音频段落
+    audio_segments = extract_audio_segments(file_uuid, segment_duration=5.0)
+    
+    # Step 2: 匹配
+    results = []
+    for segment in audio_segments:
+        # 提取段落 embedding
+        segment_embedding = audio_model.extract_embedding(segment)
+        
+        # 1对多匹配
+        match_result = combined_match(
+            detected_embedding=segment_embedding,
+            reference_embeddings=sound_identity["reference_data"]["sound_embeddings"],
+            threshold=threshold
+        )
+        
+        if match_result["is_match"]:
+            results.append({
+                "timestamp_start": segment["timestamp_start"],
+                "timestamp_end": segment["timestamp_end"],
+                "match_score": match_result["final_score"],
+                "sound_type": sound_identity["name"]
+            })
+    
+    return results
+```
+
+---
+
+## 数据库设计
+
+### identities 表扩展
+
+```sql
+-- Migration TBD: identities 表添加 sound_embedding
+ALTER TABLE identities ADD COLUMN sound_embedding VECTOR(TBD);
+
+-- 索引配置
+CREATE INDEX idx_identities_sound_embedding ON identities 
+ USING ivfflat (sound_embedding vector_cosine_ops) 
+ WITH (lists = 100);
+```
+
+### sound_type 分类表（可选）
+
+```sql
+CREATE TABLE sound_types (
+    sound_type_code    VARCHAR(50) PRIMARY KEY,  -- animal_dog_bark
+    sound_type_name    TEXT NOT NULL,            -- 狗叫声
+    category           VARCHAR(20),              -- animal, environmental, weapon, musical
+    description        TEXT,
+    created_at         TIMESTAMPTZ DEFAULT NOW()
+);
+```
+
+---
+
+## 实作计划
+
+### Phase 5.1: 模型评估和选择
+
+- [ ] 评估 PANNs、YAMNet、VGGish、CLAP 等模型
+- [ ] 确定 embedding 维度
+- [ ] 确定 GPU 支持（MPS/CUDA）
+- [ ] 性能基准测试
+
+### Phase 5.2: 数据库扩展
+
+- [ ] Migration TBD: identities 表添加 sound_embedding VECTOR(TBD)
+- [ ] sound_types 分类表建立
+- [ ] 测试数据建立
+
+### Phase 5.3: 声音 Identity 注册
+
+- [ ] 声音 embedding 提取脚本
+- [ ] reference_data JSONB 存储
+- [ ] Identity 注册 API
+
+### Phase 5.4: 声音匹配
+
+- [ ] 音频段落提取脚本
+- [ ] 1对多匹配算法实现
+- [ ] 匹配结果存储到 pre_chunks
+
+### Phase 5.5: 前端集成
+
+- [ ] 声音 Identity 管理界面
+- [ ] 声音匹配结果展示
+- [ ] 声音搜索功能
+
+---
+
+## 待辦事項
+
+| 項目 | 優先級 | 說明 |
+|------|--------|------|
+| 模型评估和选择 | 高 | Phase 5.1 |
+| 数据库扩展 | 高 | Phase 5.2 |
+| 声音 Identity 注册 | 中 | Phase 5.3 |
+| 声音匹配 | 中 | Phase 5.4 |
+| 前端集成 | 低 | Phase 5.5 |
+
+---
+
+## 技术挑战
+
+### 挑战 1: Embedding 维度选择
+
+| 问题 | 说明 |
+|------|------|
+| **维度过高** | 存储成本高，计算效率低 |
+| **维度过低** | 信息损失，识别准确率下降 |
+| **解决方案** | 评估不同模型，选择平衡维度（推荐 128-512 dim） |
+
+### 挑战 2: 声音样本质量
+
+| 问题 | 说明 |
+|------|------|
+| **噪音干扰** | 背景噪音影响 embedding 质量 |
+| **采样率不统一** | 不同音频采样率差异 |
+| **解决方案** | 1对多参考向量 + 质量评分机制 |
+
+### 挑战 3: 声音重叠识别
+
+| 问题 | 说明 |
+|------|------|
+| **多声音重叠** | 同时出现多种声音 |
+| **解决方案** | 音频分离技术 + 多 Identity 匹配 |
+
+---
+
+## 限制條件
+
+- 本设计为 Phase 5+ 待辦事項，不在当前实作范围
+- 声音 embedding 维度 TBD，需模型评估
+- 声音识别准确率依赖模型性能
+- 需要 GPU 支持（MPS/CUDA）
+
+---
+
+## 相关文件
+
+- `docs_v1.0/ARCHITECTURE/IDENTITY_REFERENCE_VECTOR_DESIGN.md` - 1对多参考向量设计
+- `docs_v1.0/ARCHITECTURE/MOMENTRY_CORE_ARCHITECTURE_V2.md` - 核心架构设计
+- `docs_v1.0/IMPLEMENTATION/FILE_IDENTITY_API_DESIGN.md` - API 设计
+
+---
+
+## 版本信息
+
+- 版本: V1.0
+- 建立日期: 2026-04-28
+- 文件更新: 2026-04-28
+- 状态: Phase 5+ 待辦事項
@@ -174,8 +174,6 @@ Momentry Core 需要將連續視頻轉化為可檢索的知識單元。需要一

 ### TDR-003: 編程語言選擇

-
-
 | 項目 | 內容 |
 |------|------|
 | **決策標題** | 使用 Rust 作為核心開發語言 |
@@ -188,29 +186,21 @@ Momentry Core 需要將連續視頻轉化為可檢索的知識單元。需要一

 #### 3.2 評估選項

-
-
 **選項 A: Python**
 - 生態豐富，AI 庫完善
 - 開發速度快
 - 但性能較低，不適合高並發

-
-
 **選項 B: Go**
 - 性能好，並發支持好
 - 簡單易學
 - 但生態不如 Rust 豐富

-
-
 **選項 C: Rust（選擇方案）**
 - 高性能，接近 C++ 的性能
 - 內存安全，無 GC
 - 強大的類型系統和錯誤處理

-
-
 **選項 D: Java/Kotlin**
 - 企業級生態
 - 性能良好
@@ -241,20 +231,14 @@ Momentry Core 需要將連續視頻轉化為可檢索的知識單元。需要一
 - ✅ Python 用於 AI 模型處理
 - ✅ 通過子進程調用橋接 Rust 和 Python

-
-
 #### 3.6 相關鏈接
 - 代碼庫：`src/` 目錄
 - [RUST_DEVELOPMENT.md](../REFERENCE/RUST_DEVELOPMENT.md)

-
-
 ---

 ### TDR-004: 分片規則分析與未來規劃

-
-
 | 項目 | 內容 |
 |------|------|
 | **決策標題** | 視覺/場景/摘要分片的設計意義與實現規劃 |
@@ -264,111 +248,73 @@ Momentry Core 需要將連續視頻轉化為可檢索的知識單元。需要一

 #### 4.1 視覺分片 (Visual Chunk) 的意義

-
 **核心價值**：
 1. **物件級搜索**：支持「看到了什麼」的搜索
 2. **跨模態橋接**：連接視覺與語音/文本內容
 3. **場景理解基礎**：通過物件組合理解場景

-
-
 **好處**：
 - 實現「視覺第一」的搜索體驗
 - 支持基於物件出現的視頻分析
 - 為場景分析提供基礎數據

-
-
 #### 4.2 場景分片 (Scene Chunk) 的意義

-
-
 **核心價值**：
 1. **語義聚合**：將相關句子/物件組成有意義場景
 2. **上下文保留**：保持對話和行為的連貫性
 3. **高效檢索**：直接定位到場景而非單句

-
-
 **好處**：
 - 支持語義級搜索（如「會議對話」、「爭吵場景」）
 - 保留完整上下文
 - 為故事摘要提供基礎

-
-
 #### 4.3 摘要分片 (Summary Chunk) 的意義

-
-
-
 **核心價值**：
 1. **高層級理解**：提供視頻整體概括
 2. **5W1H 結構化**：提取關鍵信息
 3. **敘事壓縮**：將長視頻精簡為可快速理解的摘要

-
-
-
 **好處**：
 - 用戶無需觀看整個視頻即可了解內容
 - 提供清晰的結構化信息
 - 支持視頻內容快速評估和比較

-
-
 #### 4.4 實現優先級與挑戰

-
 **實現優先級**：
 1. ✅ **Rule 1 (句子級)** - 已實現
 2. ⚠️ **Rule 3 (場景級)** - 部分實現（基於 CUT 數據）
 3. ❌ **Rule 2 (視覺級)** - 待實現
 4. ❌ **Rule 4 (摘要級)** - 待實現

-
-
-
 **技術挑戰**：
 1. **視覺分片**：物件檢測準確性與性能平衡
 2. **場景分片**：場景邊界智能識別
 3. **摘要分片**：LLM 摘要質量與一致性
 4. **數據融合**：多模態信息有效整合

-
-
-
 #### 4.5 遷移計劃

-
-
-
 **短期 (1-2個月)**：
 - 完善 Rule 3 (場景級分片)
 - 集成 Places365 場景分類
 - 完善基於視覺和語音的場景識別

-
-
 **中期 (3-6個月)**：
 - 實現 Rule 2 (視覺分片)
 - 集成 YOLO 物件檢測
 - 創建物件標籤索引

-
-
 **長期 (6-12個月)**：
 - 實現 Rule 4 (摘要分片)
 - 集成 LLM 摘要生成
 - 實現5W1H結構化提取

-
-
-
 #### 4.6 相關鏈接

-
-
 - [CHUNKING_ARCHITECTURE.md](./chunking/CHUNKING_ARCHITECTURE.md))
 - Rule 1 實現：`src/core/chunk/rule1_ingest.rs`
 - Rule 3 實現：`src/core/chunk/rule3_ingest.rs`
@@ -377,12 +323,8 @@ Momentry Core 需要將連續視頻轉化為可檢索的知識單元。需要一

 ## 3. 設計與實現差異分析

-
-
 ### 設計目標 vs 實際實現

-
-
 #### 差異點1: chunk_type 定義

 | 設計文件 | 實際代碼 | 狀態分析 |
@@ -393,13 +335,8 @@ Momentry Core 需要將連續視頻轉化為可檢索的知識單元。需要一
 | `summary` | 未實現 | ❌ 缺失設計功能 |
 | - | `"time"`, `"trace"`, `"story"` | 🔄 代碼中的額外類型 |

-
-
-
 #### 差異點2: 分片規則實現

-
-
 | 規則 | 設計描述 | 實現狀態 | 問題分析 |
 |------|----------|----------|----------|
 | Rule 1 | 句子級檢索 | ✅ 已實現 | 完整功能 |
@@ -407,13 +344,8 @@ Momentry Core 需要將連續視頻轉化為可檢索的知識單元。需要一
 | Rule 3 | 場景級檢索 | ⚠️ 部分實現 | 僅基於CUT數據，缺少場景分類 |
 | Rule 4 | 摘要級檢索 | ❌ 未實現 | 缺少LLM集成和結構化摘要 |

-
-
-
 #### 差異點3: 數據庫結構

-
-
 | 設計目標 | 實現現狀 | 分析 |
 |----------|----------|------|
 | 通用分片結構 | 已實現基本結構 | ✅ |
@@ -421,248 +353,141 @@ Momentry Core 需要將連續視頻轉化為可檢索的知識單元。需要一
 | 場景聚合表 | 部分實現 | ⚠️ |
 | 摘要生成表 | 未實現 | ❌ |

-
-
-
 ---

-
-
 ## 4. 建議實現路徑與計劃

-
-
-
 ### 優先級1: 完善現有實現

-
-
 **短期目標 (1-2週)**：

-
-
 1. **統一 `chunk_type` 枚舉**：
   - 更新 `src/core/chunk/types.rs` 中的 `ChunkType` 枚舉
   - 確保與數據庫中存儲的字符串值一致

-
-
-
 2. **擴展Rule 3實現**：
   - 集成Places365模型進行場景分類
   - 結合視覺和語音數據的場景邊界識別
   - 創建 `chunks_rule3` 表的完整結構

-
-
 ### 優先級2: 實現視覺分片

-
-
 **中期目標 (1-2個月)**：

-
-
 1. **YOLO集成**：
   - 創建 `yolo_processor.py` 腳本
   - 實現基於關鍵幀的物件檢測
   - 物件標籤標準化和索引建立

-
 2. **視覺分片生成**：
   - 創建 `visual_ingest.rs` 處理器
   - 實現物件聚合和標籤生成
   - 創建 `chunks_rule2` 表結構

-
-
-
 ### 優先級3: 實現摘要分片

-
-
-
 **長期目標 (3-6個月)**：

-
-
 1. **LLM集成**：
   - 集成Gemma4或類似LLM
   - 實現視頻內容摘要生成
   - 5W1H結構化信息提取

-
-
 2. **摘要分片生成**：
   - 創建 `summary_ingest.rs` 處理器
   - 實現跨場景的敘事壓縮
   - 創建 `chunks_rule4` 表結構

-
-
-
 ---

-
-
 ## 5. 關鍵決策點總結

-
-
-
-
-
-
 ### 決策1: 分層架構設計

-
-
 **設計目標**：
 - 四層分片架構：句子 → 視覺 → 場景 → 摘要
 - 多粒度檢索：從細節到整體的不同層次理解

-
-
 **實現現狀**：
 - 句子級分片（Rule 1）完整實現
 - 場景級分片（Rule 3）部分實現
 - 視覺和摘要分片未實現

-
-
-
-
 ### 決策2: 數據庫混合架構

-
-
 **設計目標**：
 - PostgreSQL: 主數據存儲
 - Redis: 緩存和隊列
 - MongoDB: 文檔緩存
 - Qdrant: 向量搜索

-
-
 **實現現狀**：
 - ✅ 所有數據庫均已集成
 - ✅ 多數據庫協同工作
 - ⚠️ 數據一致性管理需要完善

-
-
 ### 決策3: 技術棧選擇

-
-
-
 **設計目標**：
 - Rust: 核心系統語言
 - Python: AI模型處理
 - Axum: Web框架
 - Tokio: 異步運行時

-
-
-
 **實現現狀**：
 - ✅ Rust核心系統完整實現
 - ✅ Python AI模型集成
 - ✅ Axum + Tokio 穩定運行
 - ⚠️ Python-Rust 橋接效率需優化

-
-
-
-
-
 ---

-
-
 ## 6. 未來改進方向

-
-
-
-
 ### 短期改進 (1-2個月)

-
-
 1. **統一API設計**：
   - 標準化所有列表API的分頁參數
   - 統一回應結構格式
   - 完善錯誤處理和文檔

-
-
 2. **優化性能**：
   - 改進數據庫查詢效率
   - 優化Python子進程調用
   - 改善並發處理能力

-
-
-
-
 ### 中期改進 (3-6個月)

-
-
-
 1. **完善分片規則**：
   - 實現視覺分片（Rule 2）
   - 實現摘要分片（Rule 4）
   - 完善場景分片（Rule 3）

-
-
-
 2. **擴展功能**：
   - 支持更多視頻格式
   - 集成更多AI模型
   - 提供更多分析維度

-
-
-
-
 ### 長期改進 (6-12個月)

-
-
-
-
 1. **系統架構升級**：
   - 微服務化架構
   - 雲原生部署支持
   - 大規模視頻處理能力

-
-
 2. **平台化發展**：
   - 多租戶支持
   - 可擴展插件架構
-  雲端協同工作流
-
-
+- 雲端協同工作流

 ---

-
-
-
 ## 7. 最後更新記錄

-
-
 | 版本 | 日期 | 主要變更 | 操作人 |
 |------|------|----------|--------|
 | V1.0 | 2026-04-22 | 創建技術決策記錄文件 | OpenCode |
 | V1.1 | 2026-04-22 | 添加設計與實現差異分析 | OpenCode |
 | V1.2 | 2026-04-22 | 完善實現計劃和改進方向 | OpenCode |

-
-
-**最後更新日期**: 2026-04-22
+**最後更新日期**: 2026-04-22
@@ -306,4 +306,4 @@ python3 scripts/check_architecture_docs.py --check-terminology

 **文件版本**: V1.0  
 **最後更新**: 2026-04-22  
-**維護者**: OpenCode
+**維護者**: OpenCode
@@ -278,17 +278,17 @@ pub async fn register(
    }
    
    // 關聯 user_id 到影片
-    let video_uuid = state.db.create_video(req, Some(ctx.user_id)).await?;
+    let file_uuid = state.db.create_video(req, Some(ctx.user_id)).await?;
    
    // 建立 processing job（帶 user_id）
    state.db.create_monitor_job(
        job_type: "auto_ingestion",
-        video_uuid,
+        file_uuid,
        user_id: Some(ctx.user_id),
        processors: vec!["asr", "cut", "yolo", "ocr", "face", "pose"],
    ).await?;
    
-    Ok(Json(RegisterResponse { uuid: video_uuid }))
+    Ok(Json(RegisterResponse { uuid: file_uuid }))
 }
 ```

@@ -96,19 +96,19 @@ ADD COLUMN audio_visual_confidence FLOAT; -- 融合置信度
 系統如何精確計算「說話者」與「臉部」的關聯？

 ### 3.1 演算法步驟
-1.  **時間切片**: 將影片以 `1秒` 為單位劃分時間窗。
-2.  **標籤映射**: 
+1. **時間切片**: 將影片以 `1秒` 為單位劃分時間窗。
+2. **標籤映射**: 
    - 若該秒有 ASRX 輸出，標記為 `ActiveSpeaker = SPEAKER_XX`。
    - 若該秒 Face Processor 偵測到臉部，標記為 `ActiveFace = FACE_YY` (取信心值最高且面積最大者)。
-3.  **共現矩陣 (Co-occurrence Matrix)**: 統計每對 `(SPEAKER_XX, FACE_YY)` 同時出現的秒數。
-4.  **計算重疊率**:
+3. **共現矩陣 (Co-occurrence Matrix)**: 統計每對 `(SPEAKER_XX, FACE_YY)` 同時出現的秒數。
+4. **計算重疊率**:
   ```math
   Overlap(S_x, F_y) = \frac{\text{Count}(S_x \cap F_y)}{\text{Count}(S_x)}
   ```
-5.  **決策**:
-   - 若 `Overlap > 0.60` → 建立強關聯 (High Confidence)。
-   - 若 `0.30 <= Overlap <= 0.60` → 建立建議關聯 (Medium Confidence)。
-   - 若 `Overlap < 0.30` → 忽略 (可能是畫外音或群體場景)。
+5. **決策**:
+- 若 `Overlap > 0.60` → 建立強關聯 (High Confidence)。
+- 若 `0.30 <= Overlap <= 0.60` → 建立建議關聯 (Medium Confidence)。
+- 若 `Overlap < 0.30` → 忽略 (可能是畫外音或群體場景)。

 ### 3.2 偽代碼範例
 ```python
@@ -149,10 +149,10 @@ graph TD
 ```

 ### 4.1 執行時機
-1.  `ASRX` 與 `Face` 處理器均完成。
-2.  觸發 `audio_visual_binding_worker`。
-3.  產出 `speaker_face_mapping.json`。
-4.  寫入資料庫，並更新 `person_identities` 表。
+1. `ASRX` 與 `Face` 處理器均完成。
+2. 觸發 `audio_visual_binding_worker`。
+3. 產出 `speaker_face_mapping.json`。
+4. 寫入資料庫，並更新 `person_identities` 表。

 ---

@@ -71,11 +71,11 @@ graph LR
 ```

 ### 1.1 關鍵步驟
-1.  **Metadata 解析**: 從檔名或 `ffprobe` 資訊中提取電影名稱與年份。
-2.  **TMDB 查詢**: 呼叫 API 獲取 Top Cast (通常前 10-15 名) 及其照片 URL。
-3.  **照片下載與特徵提取**: 下載演員照片並生成 Face Embedding (512-dim)。
-4.  **向量比對**: 將演員照片向量與影片內偵測到的 **Face Cluster Centroids** 進行相似度比對 (Cosine Similarity)。
-5.  **身分決議**: 若相似度超過閾值 (如 0.6)，則自動建立全域身分並標記。
+1. **Metadata 解析**: 從檔名或 `ffprobe` 資訊中提取電影名稱與年份。
+2. **TMDB 查詢**: 呼叫 API 獲取 Top Cast (通常前 10-15 名) 及其照片 URL。
+3. **照片下載與特徵提取**: 下載演員照片並生成 Face Embedding (512-dim)。
+4. **向量比對**: 將演員照片向量與影片內偵測到的 **Face Cluster Centroids** 進行相似度比對 (Cosine Similarity)。
+5. **身分決議**: 若相似度超過閾值 (如 0.6)，則自動建立全域身分並標記。

 ---

@@ -148,17 +148,17 @@ CREATE INDEX idx_person_global ON person_identities(global_person_id);

 系統如何決定「畫面中的臉」就是「Cary Grant」？

-1.  **參考集準備 (Reference Set)**:
-    *   從 TMDB 獲取演員照片 URL。
-    *   下載並使用 InsightFace 提取向量 $V_{actor}$。
-2.  **目標集 (Target Set)**:
-    *   從影片 Face Processor 獲取每個 Cluster 的中心向量 $V_{cluster}$。
-3.  **計算相似度**:
-    *   $Score = 1 - \text{CosineDistance}(V_{actor}, V_{cluster})$
-4.  **決策閾值**:
-    *   **High Confidence (> 0.70)**: 自動確認身分 (Auto-Confirm)。
-    *   **Medium Confidence (0.55 - 0.70)**: 標記為 "Suggestion" (建議)，需人工確認。
-    *   **Low Confidence (< 0.55)**: 忽略，保持為 "Unknown Cluster"。
+1. **參考集準備 (Reference Set)**:
+    - 從 TMDB 獲取演員照片 URL。
+    - 下載並使用 InsightFace 提取向量 $V_{actor}$。
+2. **目標集 (Target Set)**:
+    - 從影片 Face Processor 獲取每個 Cluster 的中心向量 $V_{cluster}$。
+3. **計算相似度**:
+    - $Score = 1 - \text{CosineDistance}(V_{actor}, V_{cluster})$
+4. **決策閾值**:
+    - **High Confidence (> 0.70)**: 自動確認身分 (Auto-Confirm)。
+    - **Medium Confidence (0.55 - 0.70)**: 標記為 "Suggestion" (建議)，需人工確認。
+    - **Low Confidence (< 0.55)**: 忽略，保持為 "Unknown Cluster"。

 ### 3.3 角色名關聯 (Role Mapping)

@@ -179,21 +179,21 @@ TMDB 返回的結構包含 `character` 字段：

 此流程被打包為一個獨立的 **Post-Face-Processing Job**。

-1.  **Trigger**: `face_processor` 完成，產生 `face_clusters`。
-2.  **Action**: 系統檢查 `asset_type == 'movie'` 且 `title` 存在。
-3.  **Execution**: 執行 `tmdb_cast_ingestion.py`。
-    *   查詢 TMDB。
-    *   下載圖片 -> 計算向量 -> 存入 `global_person_identities` (若不存在)。
-    *   執行比對 -> 更新 `person_identities`。
-4.  **Output**: 資料庫中充滿了真實姓名與角色名的紀錄，供 Rule 3/4 Chunking 使用。
+1. **Trigger**: `face_processor` 完成，產生 `face_clusters`。
+2. **Action**: 系統檢查 `asset_type == 'movie'` 且 `title` 存在。
+3. **Execution**: 執行 `tmdb_cast_ingestion.py`。
+    - 查詢 TMDB。
+    - 下載圖片 -> 計算向量 -> 存入 `global_person_identities` (若不存在)。
+    - 執行比對 -> 更新 `person_identities`。
+4. **Output**: 資料庫中充滿了真實姓名與角色名的紀錄，供 Rule 3/4 Chunking 使用。

 ---

 ## 5. 容錯與異常處理 (Error Handling)

-   **找不到電影**: 若檔名模糊導致 TMDB 無結果，則跳過此步驟，保留原始 Face Cluster ID。
-   **無演員照片**: 若某演員在 TMDB 無照片，無法進行向量比對，僅記錄名字 (若 ASR 有提及)。
-   **多人飾演一角**: 若臉部特徵同時匹配多個演員 (極罕見)，取 Confidence 最高者，其餘列入候補。
+- **找不到電影**: 若檔名模糊導致 TMDB 無結果，則跳過此步驟，保留原始 Face Cluster ID。
+- **無演員照片**: 若某演員在 TMDB 無照片，無法進行向量比對，僅記錄名字 (若 ASR 有提及)。
+- **多人飾演一角**: 若臉部特徵同時匹配多個演員 (極罕見)，取 Confidence 最高者，其餘列入候補。

 ---