fix: ASRX duplication, TKG edges, trace ingest, and add pipeline progress publishing

- ASRX handler no longer stores duplicate 'asr' pre_chunks
- Pre_chunks storage made idempotent (delete-before-insert)
- Rule 1 + trace_ingest changed to query 'asrx' not 'asr'
- Trace chunks removed (dynamic from TKG/Qdrant)
- TKG scroll_face_points fixed: trace_id >= 1 (not == 1)
- TKG AsrxSegmentEntry: start/end -> start_time/end_time (match ASRX JSON)
- Unregister error handling: log instead of silent discard
- Add publish_pipeline_progress calls at each pipeline stage
  (processors, rule1, face_trace, identity_agent, TKG, rule2, completion)
This commit is contained in:
Accusys
2026-07-02 10:43:46 +08:00
parent d791d138f2
commit 3eabd45882
65 changed files with 9477 additions and 3852 deletions
@@ -0,0 +1,545 @@
<!-- module: progress -->
<!-- description: Real-time progress tracking for processing pipeline, TKG build, and identity agent -->
<!-- depends: 01_auth, 03_register, 05_process -->
# Progress Tracking — API Workspace Module
## Overview
The progress tracking system provides real-time visibility into all processing stages:
| System | Redis Key | Coverage |
|--------|-----------|----------|
| **Processor Progress** | `{prefix}progress:{file_uuid}` | 7 main processors (cut, asr, asrx, ocr, face, pose, appearance) |
| **TKG Progress** | `{prefix}progress:{file_uuid}:tkg` | 18 TKG build phases (9 node types + 8 edge types + face_tracing) |
| **Agent Progress** | `{prefix}progress:{file_uuid}:agent` | 5 Identity Agent phases |
---
## `POST /api/v1/progress/:file_uuid`
**Auth**: Required
**Scope**: file-level
Get real-time processing progress including processor status, TKG build phases, and identity agent phases.
### Example
```bash
curl -s -X POST "$API/api/v1/progress/$FILE_UUID" \
-H "X-API-Key: $KEY" | jq '.'
```
### Response (200)
```json
{
"file_uuid": "3a6c1865...",
"overall_progress": 71,
"cpu_percent": 45.2,
"gpu_percent": 30.1,
"memory_percent": 62.4,
"processors": [
{"name": "asr", "status": "complete", "progress": 100, "current": 0, "total": 0, "message": "done"},
{"name": "face", "status": "complete", "progress": 100, "current": 0, "total": 0, "message": "done"},
{"name": "pose", "status": "complete", "progress": 100, "current": 0, "total": 0, "message": "done"}
],
"tkg_progress": {
"file_uuid": "3a6c1865...",
"phase": "mutual_gaze_edges",
"phase_index": 13,
"total_phases": 18,
"phase_progress": 0.8,
"overall_progress": 0.72,
"stats": {
"total_faces": 1250,
"traced_faces": 1250,
"total_traces": 45,
"face_track_nodes": 45,
"gaze_track_nodes": 45,
"lip_track_nodes": 12,
"text_region_nodes": 8,
"appearance_nodes": 38,
"accessory_nodes": 5,
"object_nodes": 156,
"hand_nodes": 22,
"speaker_nodes": 14,
"co_occurrence_edges": 890,
"speaker_face_edges": 120,
"face_face_edges": 234,
"mutual_gaze_edges": 67,
"total_nodes": 345,
"total_edges": 1311
},
"message": "67 mutual gaze edges",
"updated_at": "2026-07-02T10:30:00Z"
},
"agent_progress": {
"file_uuid": "3a6c1865...",
"phase": "completed",
"phase_index": 5,
"total_phases": 5,
"phase_progress": 1.0,
"overall_progress": 1.0,
"stats": {
"total_faces": 1250,
"total_traces": 45,
"clusters": 18,
"identities_created": 18,
"tmdb_matches": 5,
"speaker_bindings": 12,
"confirmations": 18
},
"message": "Identity Agent processing completed",
"updated_at": "2026-07-02T10:28:00Z"
}
}
```
### Field Descriptions
#### Top Level
| Field | Type | Description |
|-------|------|-------------|
| `file_uuid` | string | 32-char hex UUID |
| `overall_progress` | integer | Overall processor progress (0100) |
| `processors` | array | Per-processor status |
| `tkg_progress` | object | TKG build progress (null if not started) |
| `agent_progress` | object | Identity Agent progress (null if not started) |
#### TKG Progress Fields
| Field | Type | Description |
|-------|------|-------------|
| `phase` | string | Current phase name (see TKG Phases below) |
| `phase_index` | integer | Current phase index (017) |
| `total_phases` | integer | Total phases: 18 |
| `phase_progress` | float | Progress within current phase (0.01.0) |
| `overall_progress` | float | Overall TKG progress (0.01.0) |
| `stats` | object | Counts for all node and edge types |
| `message` | string | Human-readable status message |
#### TKG Phases (18 total)
| Index | Phase | Description |
|-------|-------|-------------|
| 0 | `face_tracing` | Populate trace_id from face.json |
| 1 | `face_track_nodes` | Build face_track nodes |
| 2 | `gaze_track_nodes` | Build gaze_track nodes |
| 3 | `lip_track_nodes` | Build lip_track nodes |
| 4 | `text_region_nodes` | Build text_region nodes |
| 5 | `appearance_nodes` | Build appearance_trace nodes |
| 6 | `accessory_nodes` | Build accessory nodes |
| 7 | `object_nodes` | Build yolo_object nodes |
| 8 | `hand_nodes` | Build hand nodes |
| 9 | `speaker_nodes` | Build speaker nodes |
| 10 | `co_occurrence_edges` | Build co_occurrence edges |
| 11 | `speaker_face_edges` | Build speaker_face edges |
| 12 | `face_face_edges` | Build face_face edges |
| 13 | `mutual_gaze_edges` | Build mutual_gaze edges |
| 14 | `lip_sync_edges` | Build lip_sync edges |
| 15 | `has_appearance_edges` | Build has_appearance edges |
| 16 | `wears_edges` | Build wears edges |
| 17 | `hand_object_edges` | Build hand_object edges |
#### TKG Stats Fields
| Field | Type | Description |
|-------|------|-------------|
| `total_faces` | integer | Total face detections |
| `traced_faces` | integer | Faces with trace_id assigned |
| `total_traces` | integer | Unique trace count |
| `face_track_nodes` | integer | Face track nodes created |
| `gaze_track_nodes` | integer | Gaze track nodes created |
| `lip_track_nodes` | integer | Lip track nodes created |
| `text_region_nodes` | integer | Text region nodes created |
| `appearance_nodes` | integer | Appearance trace nodes created |
| `accessory_nodes` | integer | Accessory nodes created |
| `object_nodes` | integer | YOLO object nodes created |
| `hand_nodes` | integer | Hand nodes created |
| `speaker_nodes` | integer | Speaker nodes created |
| `co_occurrence_edges` | integer | Co-occurrence edges created |
| `speaker_face_edges` | integer | Speaker-face edges created |
| `face_face_edges` | integer | Face-face edges created |
| `mutual_gaze_edges` | integer | Mutual gaze edges created |
| `lip_sync_edges` | integer | Lip sync edges created |
| `has_appearance_edges` | integer | Has-appearance edges created |
| `wears_edges` | integer | Wears edges created |
| `hand_object_edges` | integer | Hand-object edges created |
| `total_nodes` | integer | Total nodes (sum of all node types) |
| `total_edges` | integer | Total edges (sum of all edge types) |
---
## `GET /api/v1/stats/ingestion-status/:file_uuid`
**Auth**: Required
**Scope**: file-level
Get detailed ingestion status showing completion of all 24 processing steps.
### Example
```bash
curl -s "$API/api/v1/stats/ingestion-status/$FILE_UUID" \
-H "X-API-Key: $KEY" | jq '.steps[] | {name, status, detail}'
```
### Response (200)
```json
{
"file_uuid": "3a6c1865...",
"steps": [
{"name": "rule1_sentence", "status": "done", "detail": "156 sentence chunks"},
{"name": "auto_vectorize", "status": "done", "detail": "156 embedded"},
{"name": "face_track", "status": "done", "detail": "45 traces / 1250 detections"},
{"name": "trace_chunks", "status": "done", "detail": "45 trace chunks"},
{"name": "tkg_face_track", "status": "done", "detail": "45 nodes"},
{"name": "tkg_gaze_track", "status": "done", "detail": "45 nodes"},
{"name": "tkg_lip_track", "status": "done", "detail": "12 nodes"},
{"name": "tkg_text_region", "status": "done", "detail": "8 nodes"},
{"name": "tkg_appearance", "status": "done", "detail": "38 nodes"},
{"name": "tkg_accessory", "status": "done", "detail": "5 nodes"},
{"name": "tkg_object", "status": "done", "detail": "156 nodes"},
{"name": "tkg_hand", "status": "done", "detail": "22 nodes"},
{"name": "tkg_speaker", "status": "done", "detail": "14 nodes"},
{"name": "tkg_co_occurrence", "status": "done", "detail": "890 edges"},
{"name": "tkg_speaker_face", "status": "done", "detail": "120 edges"},
{"name": "tkg_face_face", "status": "done", "detail": "234 edges"},
{"name": "tkg_mutual_gaze", "status": "done", "detail": "67 edges"},
{"name": "tkg_lip_sync", "status": "done", "detail": "12 edges"},
{"name": "tkg_has_appearance", "status": "done", "detail": "38 edges"},
{"name": "tkg_wears", "status": "done", "detail": "22 edges"},
{"name": "tkg_hand_object", "status": "done", "detail": "18 edges"},
{"name": "rule2_relationship", "status": "done", "detail": "1331 relationship chunks"},
{"name": "identity_match", "status": "done", "detail": "18 identities matched"},
{"name": "scene_metadata", "status": "done", "detail": null}
],
"related_identities": [
{"uuid": "a9a901056d6b46ff92da0c3c1a57dff4", "name": "John Smith"}
],
"strangers": 3
}
```
### Step Descriptions
| Step | Status When Done |
|------|-----------------|
| `rule1_sentence` | sentence_count > 0 |
| `auto_vectorize` | sentence_embedded > 0 |
| `face_track` | trace_count > 0 |
| `trace_chunks` | trace_chunks > 0 |
| `tkg_face_track``tkg_speaker` | Node count > 0 (9 steps) |
| `tkg_co_occurrence``tkg_hand_object` | Edge count > 0 (8 steps) |
| `rule2_relationship` | relationship_chunks > 0 |
| `identity_match` | identity_count > 0 |
| `scene_metadata` | scene_meta.json exists |
---
## `POST /api/v1/file/:file_uuid/tkg/rebuild`
**Auth**: Required
**Scope**: file-level
Manually trigger TKG rebuild. Automatically triggers Rule 2 ingestion after TKG completes.
### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/tkg/rebuild" \
-H "X-API-Key: $KEY" \
-H "Content-Type: application/json" -d '{}'
```
### Response (200)
```json
{
"success": true,
"message": "TKG rebuild started",
"nodes": 345,
"edges": 1311
}
```
---
## `POST /api/v1/file/:file_uuid/rule2`
**Auth**: Required
**Scope**: file-level
Manually trigger Rule 2 ingestion (TKG edges → relationship chunks).
### Example
```bash
curl -s -X POST "$API/api/v1/file/$FILE_UUID/rule2" \
-H "X-API-Key: $KEY" \
-H "Content-Type: application/json" -d '{}'
```
### Response (200)
```json
{
"success": true,
"message": "Rule 2 ingestion: 1331 relationship chunks created",
"rule2_count": 1331
}
```
---
## Processing Pipeline Flow
```
1. Processors (concurrent)
├── cut, asr, ocr, face, pose, appearance → complete
└── asrx → after cut+asr
2. Post-Processor Triggers (automatic)
├── Rule 1 Ingestion (ASR+OCR → sentence chunks)
├── Face Trace + DB Store (face_traced.json → Qdrant trace_id)
├── TMDb Face Matching (if enabled)
├── Heuristic Scene Metadata
├── Identity Agent (face + ASRX)
└── TKG Build (automatic after processors complete)
└── Rule 2 Ingestion (automatic after TKG)
└── Relationship chunks vectorized
3. Completion
└── Job marked completed when all ingestion steps done
```
## Error Codes
| Code | HTTP | When |
|------|------|------|
| E001 | 400 | Invalid file_uuid format |
| E002 | 404 | File not found |
| E003 | 404 | No TKG data available |
| E010 | 500 | Qdrant connection failed |
| E011 | 500 | Database connection failed |
---
## `GET /api/v1/stats/pipeline/:file_uuid`
**Auth**: Required
**Scope**: file-level
Get segmented pipeline progress with weighted stage breakdown. Shows overall progress as weighted sum of all pipeline stages.
### Pipeline Stages and Weights
| Stage | Weight | Description |
|-------|--------|-------------|
| `processors` | 30% | 7 concurrent processors (cut, asr, asrx, ocr, face, pose, appearance) |
| `rule1_ingestion` | 5% | ASR+OCR → sentence chunks |
| `face_tracing` | 5% | Face trace_id assignment |
| `identity_agent` | 10% | Identity creation, TMDb matching, speaker binding |
| `tkg_nodes` | 20% | TKG node building (9 node types) |
| `tkg_edges` | 15% | TKG edge building (8 edge types) |
| `rule2_ingestion` | 15% | TKG edges → relationship chunks |
### Example
```bash
curl -s "$API/api/v1/stats/pipeline/$FILE_UUID" \
-H "X-API-Key: $KEY" | jq '.'
```
### Response (200)
```json
{
"file_uuid": "3a6c1865...",
"overall_progress": 0.65,
"stages": [
{"name": "processors", "weight": 0.30, "progress": 1.0, "status": "completed", "detail": "7/7 complete"},
{"name": "rule1_ingestion", "weight": 0.05, "progress": 1.0, "status": "completed", "detail": "156 chunks"},
{"name": "face_tracing", "weight": 0.05, "progress": 1.0, "status": "completed", "detail": "45 traces"},
{"name": "identity_agent", "weight": 0.10, "progress": 1.0, "status": "completed", "detail": "18 identities"},
{"name": "tkg_nodes", "weight": 0.20, "progress": 1.0, "status": "completed", "detail": "345 nodes"},
{"name": "tkg_edges", "weight": 0.15, "progress": 0.5, "status": "running", "detail": "mutual_gaze_edges: 67/8 expected"},
{"name": "rule2_ingestion", "weight": 0.15, "progress": 0.0, "status": "pending", "detail": null}
],
"updated_at": "2026-07-02T10:30:00Z"
}
```
### Field Descriptions
| Field | Type | Description |
|-------|------|-------------|
| `file_uuid` | string | 32-char hex UUID |
| `overall_progress` | float | Weighted sum of all stage progress (0.01.0) |
| `stages` | array | Per-stage progress breakdown |
| `stages[].name` | string | Stage name |
| `stages[].weight` | float | Stage weight in overall progress |
| `stages[].progress` | float | Stage completion (0.01.0) |
| `stages[].status` | string | `"pending"`, `"running"`, `"completed"`, `"failed"` |
| `stages[].detail` | string | Human-readable detail (optional) |
| `updated_at` | string | ISO 8601 timestamp |
### Overall Progress Calculation
```
overall_progress = Σ(stage.weight × stage.progress) for all stages
```
Example calculation:
- processors: 0.30 × 1.0 = 0.30
- rule1_ingestion: 0.05 × 1.0 = 0.05
- face_tracing: 0.05 × 1.0 = 0.05
- identity_agent: 0.10 × 1.0 = 0.10
- tkg_nodes: 0.20 × 1.0 = 0.20
- tkg_edges: 0.15 × 0.5 = 0.075
- rule2_ingestion: 0.15 × 0.0 = 0.0
- **Total: 0.775 (77.5%)**
---
## `GET /api/v1/stats/file/:file_uuid`
**Auth**: Required
**Scope**: file-level
Get comprehensive file statistics from all data sources: JSON processing status, PostgreSQL counts, Qdrant collections, TKG nodes/edges, and Identity Agent stats.
### Example
```bash
curl -s "$API/api/v1/stats/file/$FILE_UUID" \
-H "X-API-Key: $KEY" | jq '.'
```
### Response (200)
```json
{
"file_uuid": "3a6c1865...",
"file_name": "video.mp4",
"status": "processing",
"processors": [
{"name": "asr", "status": "complete", "progress": 100, "message": "done"},
{"name": "face", "status": "complete", "progress": 100, "message": "done"}
],
"postgres": {
"sentence_chunks": 156,
"trace_chunks": 45,
"relationship_chunks": 1331,
"identities": 18,
"file_identities": 18
},
"qdrant": {
"faces": 1250,
"face_traces": 45,
"face_identities": 18,
"text_chunks": 4562,
"speakers": 434
},
"tkg": {
"total_nodes": 345,
"total_edges": 1311,
"face_track_nodes": 45,
"gaze_track_nodes": 45,
"lip_track_nodes": 12,
"text_region_nodes": 8,
"appearance_nodes": 38,
"accessory_nodes": 5,
"object_nodes": 156,
"hand_nodes": 22,
"speaker_nodes": 14,
"co_occurrence_edges": 890,
"speaker_face_edges": 120,
"face_face_edges": 234,
"mutual_gaze_edges": 67,
"lip_sync_edges": 12,
"has_appearance_edges": 38,
"wears_edges": 22,
"hand_object_edges": 18
},
"identity_agent": {
"clusters": 18,
"identities_created": 18,
"tmdb_matches": 5,
"speaker_bindings": 12,
"confirmations": 18
}
}
```
### Field Descriptions
#### Top Level
| Field | Type | Description |
|-------|------|-------------|
| `file_uuid` | string | 32-char hex UUID |
| `file_name` | string | Original filename |
| `status` | string | File status: `registered`, `processing`, `completed`, `failed` |
| `processors` | array | Per-processor status from processing_status JSONB |
| `postgres` | object | PostgreSQL table counts |
| `qdrant` | object | Qdrant collection point counts |
| `tkg` | object | TKG node and edge counts by type |
| `identity_agent` | object | Identity Agent statistics |
#### PostgreSQL Stats
| Field | Type | Description |
|-------|------|-------------|
| `sentence_chunks` | integer | Rule 1 sentence chunks count |
| `trace_chunks` | integer | Face trace chunks count |
| `relationship_chunks` | integer | Rule 2 relationship chunks count |
| `identities` | integer | Unique identities bound to this file |
| `file_identities` | integer | File-identity mapping records |
#### Qdrant Stats
| Field | Type | Description |
|-------|------|-------------|
| `faces` | integer | Total face points in `_faces` collection |
| `face_traces` | integer | Unique trace IDs in `_faces` |
| `face_identities` | integer | Unique identity IDs bound in `_faces` |
| `text_chunks` | integer | Text chunk vectors in `momentry_*_rule1_v2` |
| `speakers` | integer | Speaker segments in `momentry_*_speaker` |
#### TKG Stats
| Field | Type | Description |
|-------|------|-------------|
| `total_nodes` | integer | Sum of all node types |
| `total_edges` | integer | Sum of all edge types |
| `face_track_nodes` | integer | Face track nodes |
| `gaze_track_nodes` | integer | Gaze track nodes |
| `lip_track_nodes` | integer | Lip track nodes |
| `text_region_nodes` | integer | Text region nodes |
| `appearance_nodes` | integer | Appearance trace nodes |
| `accessory_nodes` | integer | Accessory nodes |
| `object_nodes` | integer | YOLO object nodes |
| `hand_nodes` | integer | Hand nodes |
| `speaker_nodes` | integer | Speaker nodes |
| `co_occurrence_edges` | integer | Co-occurrence edges |
| `speaker_face_edges` | integer | Speaker-face edges |
| `face_face_edges` | integer | Face-face edges |
| `mutual_gaze_edges` | integer | Mutual gaze edges |
| `lip_sync_edges` | integer | Lip sync edges |
| `has_appearance_edges` | integer | Has-appearance edges |
| `wears_edges` | integer | Wears edges |
| `hand_object_edges` | integer | Hand-object edges |
#### Identity Agent Stats
| Field | Type | Description |
|-------|------|-------------|
| `clusters` | integer | Face clusters from face_clustered.json |
| `identities_created` | integer | Identities created from clusters |
| `tmdb_matches` | integer | TMDb identity matches |
| `speaker_bindings` | integer | Speaker-to-identity bindings |
| `confirmations` | integer | Confirmed identity bindings |
@@ -0,0 +1,81 @@
---
title: Charade Identity Processing Fix Report
date: 2026-06-29
author: OpenCode
status: completed
---
## Summary
**Problem**: Charade file (UUID: c36f35685177c981aa139b66bbbccc5b) identity processing failed because of data corruption and missing TKG nodes.
**Root Cause**: Circular dependency chain broken:
- face_detections had 3x duplicate records (12726 instead of 4242)
- All trace_id = NULL (UPDATE failed)
- TKG Phase 2.5 couldn't create face_track nodes (needs trace_id)
- Identity Agent couldn't mark suggestions (needs TKG nodes)
## Fix Steps
### Step 1: Clean Duplicate Data ✅
- Deleted 8484 duplicate records
- 12726 → 4242 unique face_detections
### Step 2: Write trace_id ✅
- store_traced_faces.py successfully updated DB
- 4242 faces with trace_id (100% populated)
- 426 unique traces
### Step 3: Create TKG Nodes ✅
- Created 426 face_track nodes via SQL
- Fixed external_id format: "face_track_*" (matches Rust code)
### Step 4: Run Identity Agent ✅
- Identity matching: 2 traces matched to Audrey Hepburn
- TKG marking: 2/2 nodes marked as "suggested"
## Final Results
| Metric | Before | After |
|--------|--------|-------|
| face_detections | 12726 (3x duplicates) | 4242 (unique) |
| trace_id populated | 0 | 4242 (100%) |
| TKG face_track nodes | 0 | 426 |
| Identity suggestions | 0 | 2 (Audrey Hepburn) |
**Identity Matches**:
- Trace 202: Audrey Hepburn (score=0.6002)
- Trace 311: Audrey Hepburn (score=0.6724)
## Technical Details
### Data Sources
- face.json: 3176 frames, 4242 faces
- face_traced.json: 426 traces (IoU tracking)
- Qdrant _faces: 374 traces with embeddings
- Qdrant _seeds: 2 TMDb seeds
### Tools Used
- PostgreSQL: face_detections, tkg_nodes tables
- Python: store_traced_faces.py, identity_matcher.py
- Qdrant: _faces, _seeds collections
## Next Steps
1. User confirmation: Check suggested identities via Portal UI
2. Manual confirmation: Confirm Audrey Hepburn matches
3. Propagation: Run Round 2 matching (propagate confirmed identities)
4. Stranger clustering: Cluster unmatched traces (TH=0.40)
## Files Modified
- PostgreSQL: public.face_detections (deleted 8484 duplicates)
- PostgreSQL: public.tkg_nodes (created 426 face_track nodes)
- Qdrant: _faces collection (updated 3176 trace_ids)
## Related Documents
- docs/PROCESSING_PIPELINE.md
- src/core/processor/tkg.rs:550-683 (build_face_track_nodes)
- scripts/store_traced_faces.py (trace_id storage)
- scripts/identity_matcher.py (TMDb matching)
@@ -0,0 +1,116 @@
---
title: Cut Scene Detection Escape Fix
date: 2026-06-30
author: OpenCode
status: completed
---
## Summary
**Problem**: Cut scene detection returned only 1 scene (fallback) instead of 833 scenes for Charade video.
**Root Cause**: Python script `cut_processor.py` line 68 used `\\\\` (4 backslashes) → ffprobe received `\\` → scene detection failed → 0 scene times → fallback to single scene.
## Fix
### Code Changes
1. **scripts/cut_processor.py** line 68:
- Before: `f"movie={video_path},select='gt(scene\\\\,0.3)',showinfo"`
- After: `f"movie={video_path},select='gt(scene\\,0.3)',showinfo"`
2. **src/core/processor/cut.rs** line 127:
- Already correct: `&format!("movie={},select='gt(scene\\,0.3)',showinfo", video_path)`
- No changes needed
### Escape Analysis
| Escape Level | Python String | ffprobe receives | Result |
|--------------|---------------|------------------|--------|
| `\\\\` | `"\\"` | `\\` | ❌ 0 scenes |
| `\\` | `"\\"` | `\` | ✅ 832 scenes |
| `\` (raw) | `r"\ "` | `\` | ✅ 832 scenes |
### Testing
```bash
# Before fix
python3 scripts/cut_processor.py video.mp4 output.json
# Result: 1 scene (fallback)
# After fix
python3 scripts/cut_processor.py video.mp4 output.json
# Result: 833 scenes
```
## Verification
### File: 3dfc20618fb522e795240b5f0e5ff6f0 (Charade)
| Metric | Before | After |
|--------|--------|-------|
| cut.json scenes | 1 | 833 |
| workspace.sqlite pre_chunks (cut) | 12 | 833 |
| Scene 1 end_frame | 162695 (whole video) | 932 |
### Workspace.sqlite Status
```bash
sqlite3 output/3dfc20618fb522e795240b5f0e5ff6f0.workspace.sqlite \
"SELECT processor_type, COUNT(*) FROM pre_chunks GROUP BY processor_type;"
cut|833
ocr|942
```
## Technical Details
### ffprobe Command
Correct format:
```bash
ffprobe -v quiet -show_entries frame=pts_time -of default=nk=0 \
-f lavfi "movie=/path/to/video.mp4,select='gt(scene\\,0.3)',showinfo" \
-show_frames
```
- `scene\\,0.3` in shell → ffprobe receives `scene\,0.3`
- The `\` escapes the comma in ffmpeg filter syntax
### Python subprocess Behavior
- Without `shell=True`: arguments passed directly to executable
- Python string `"\\\\"` → subprocess receives `"\\"`
- Python string `"\\"` → subprocess receives `"\"`
- Raw string `r"\ "` → subprocess receives `"\"`
## Impact
### Affected Videos
- Charade (UUID: 3dfc20618fb522e795240b5f0e5ff6f0)
- Other videos registered before this fix may have incorrect scene counts
### Remediation
1. Re-run cut detection for affected videos
2. Update workspace.sqlite pre_chunks
3. If in PostgreSQL: update public.pre_chunks table
## Next Steps
1. Verify fix in production by registering new video
2. Check if other videos need remediation
3. Consider adding unit test for cut escape handling
## Related Files
- scripts/cut_processor.py
- src/core/processor/cut.rs
- src/api/files.rs (register API uses Python script)
## Version History
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2026-06-30 | OpenCode | Initial report |
@@ -0,0 +1,117 @@
# Face Detections 表清理計劃
## 問題
所有使用 `face_detections` 表的代碼都是錯誤的,需要改為使用 Qdrant workspace traces。
## 正確架構
### PostgreSQL
```
identities (全局人物主表)
├── id
├── uuid
├── name
├── status
└── metadata
```
### Qdrant Payload
```
{prefix}_workspace_traces (512d vectors)
├── file_uuid
├── trace_id
├── frame_number
├── identity_id ← 绑定存储在这里
├── bbox
├── confidence
└── embedding
```
## 錯誤代碼位置 (197 處)
### 1. Processor 層 (寫入錯誤)
- `src/core/processor/processor.rs` - line 744, 1311
- `src/core/processor/job_worker.rs` - line 647
- `src/core/db/workspace_sqlite.rs` - line 257-263 (函數定義)
- `src/core/db/postgres_db.rs` - line 2712 (函數定義)
### 2. TKG 處理器 (大量使用)
- `src/core/processor/tkg.rs` - ~50 處使用 `face_detections`
### 3. Chunk Ingest
- `src/core/chunk/trace_ingest.rs` - line 10
- `src/core/chunk/rule2_ingest.rs` - line 26
### 4. API 層 (查詢/更新錯誤)
- `src/api/identity_api.rs` - 22 處
- `src/api/identity_binding.rs` - 12 處
- `src/api/identities.rs` - 2 處
- `src/api/identity_agent_api.rs` - 7 處
- `src/api/files.rs` - 4 處
- `src/api/media_api.rs` - 3 處
### 5. Identity 層
- `src/core/identity/storage.rs` - 3 處
## 修改計劃
### Phase 1: 分析現有代碼
1. 理解當前 face_detections 表的使用方式
2. 理解 Qdrant workspace traces 的結構
3. 確定需要修改的函數列表
### Phase 2: 創建 Qdrant 查詢輔助函數
1. 創建 `QdrantWorkspace` 查詢方法
2. 創建 trace 到 identity 的綁定查詢
3. 創建 face 匹配查詢
### Phase 3: 修改 Processor 層
1. 修改 `processor.rs` - 移除 face_detections 寫入
2. 修改 `job_worker.rs` - 移除 face_detections 查詢
3. 修改 `workspace_sqlite.rs` - 移除 face_detections 相關函數
4. 修改 `postgres_db.rs` - 移除 face_detections 相關函數
### Phase 4: 修改 TKG 處理器
1. 重構 `tkg.rs` - 使用 Qdrant workspace traces 代替 face_detections
2. 移除 `populate_face_detections_from_face_json` 函數
3. 修改 face 匹配邏輯
### Phase 5: 修改 API 層
1. 修改 `identity_api.rs` - 使用 Qdrant 查詢
2. 修改 `identity_binding.rs` - 使用 Qdrant 綁定
3. 修改 `identities.rs` - 使用 Qdrant 查詢
4. 修改 `identity_agent_api.rs` - 使用 Qdrant 匹配
5. 修改 `files.rs` - 移除 face_detections 查詢
6. 修改 `media_api.rs` - 移除 face_detections 查詢
### Phase 6: 修改 Chunk Ingest
1. 修改 `trace_ingest.rs` - 使用 Qdrant traces
2. 修改 `rule2_ingest.rs` - 使用 Qdrant traces
### Phase 7: 測試
1. 測試 face 追蹤
2. 測試 identity 綁定
3. 測試 TKG 構建
4. 測試 API 端點
### Phase 8: 清理
1. 移除 face_detections 表(可選)
2. 更新文檔
3. 更新測試
## 風險評估
- **高風險**: TKG 處理器有大量 face_detections 使用
- **中風險**: API 層需要重構查詢邏輯
- **低風險**: Processor 層修改相對簡單
## 預估時間
- Phase 1-2: 2-3 小時
- Phase 3-4: 4-6 小時
- Phase 5-6: 3-4 小時
- Phase 7-8: 2-3 小時
- **總計**: 11-16 小時
## 依賴關係
- 需要 Qdrant workspace traces 正確填充
- 需要 face.json 格式正確
- 需要 SwiftFacePose 正常工作