Bug: When regFiles API fails, all files get 'unregistered' status even if
they are actually registered. Also, computed property was using reference
to files.value instead of a copy, which could cause mutation issues.
Fix:
- Fetch scan results FIRST (source of truth for files on disk)
- Use scan API's is_registered field as fallback status
- Only override with regFiles data if file exists in scan results
- Computed property now uses [...files.value] to create a copy
- Skip files from regFiles that don't exist on disk (deleted)
Bug: Scan files were getting status 'registered_scan' which doesn't match
any filter value (unregistered/pending/processing/completed/indexed/unindexed).
When toggling filters on/off, files would disappear because their status
didn't match any valid filter.
Fix:
- Removed 'registered_scan' status entirely
- Fetch regFiles FIRST to get real statuses
- Scan files default to 'unregistered' status
- regFiles overlay with actual status (pending/processing/completed)
- Increased regFiles page_size to 200 for larger libraries
Frontend:
- Add media type filter (全部/影片/照片)
- Add indexed status filter (未入庫/已入庫)
- Show media type column with icons
- Fix status filter to handle indexed/unindexed correctly
- Determine media type from file extension
Backend:
- Add total_chunks field to FileItem API response
- Query chunk counts efficiently in batch with IN clause
- Frontend uses total_chunks to determine is_indexed status
Bug: all_completed only checked existing results, not missing processors.
If a processor (like pose) never created a result row, all_completed would
still return true and mark the job as completed.
Fix: all_completed now checks that every processor in job_processors has
a corresponding completed result. Added logging for missing processors.
Also fixed:
- any_pending now checks all expected processors, not just existing results
- Added missing_processors detection and logging
- Skip chunks where both ASRX text and OCR text are empty
- Use count-based chunk_id instead of index to avoid gaps
- This ensures PostgreSQL and Qdrant chunk counts match
- Added text_content field to SearchResult and SemanticSearchResult
- Added get_chunk_by_id_no_embedding for keyword results without embedding requirement
- Fixed search_bm25 to use position-based ranking for CJK/Korean content
- Fixed sqlx column mapping with explicit alias
- Skip text_match filter for keyword-only results
- Use text_content as fallback when summary is empty
- Frontend calls /api/v1/face-thumbnail?uuid=...&frame=...
- Backend only had /api/v1/file/:file_uuid/thumbnail
- Added compat route and uuid field to ThumbQuery
- ingestion_complete query used file_uuid column which is always NULL
- Changed to JOIN processor_results with monitor_jobs on job_id
- All stuck jobs now complete successfully
- Removed trace_chunks field from PostgresStats struct
- Removed trace_chunks query from get_file_stats and get_ingestion_status
- Fixed OCR fetch_ocr_texts to compute frames from start_time*FPS
- Updated scan.rs to use separate count_nodes/count_edges functions
- get_pipeline_progress_handler now queries actual DB counts
- Fixed processor_results query (requires JOIN with monitor_jobs)
- Card progress bar and right-click content now consistent
- tkg_nodes has no edge_type column, query was failing silently
- Split into count_nodes(node_type) and count_edges(edge_type)
- Fixed text_region → text_trace node type name
- Also: OCR frame fix in rule1 (end_frame computed from end_time+FPS)
- get_file_identities: UNION face_detections + file_identities
- list_identities: add file_bindings from file_identities table
- Add back /api/v1/traces/unassigned route
- Total count query now includes file_identities
Frontend can now:
- Filter pending identities by file_uuid
- Filter pending faces (unassigned traces) by file_uuid
- skin_tone is a person attribute (like height), not trace attribute
- Remove build_skin_tone_trace_nodes function
- Remove skin_tone_trace_nodes from TkgResult and API response
- Remove skin_tone_trace from documentation tables
- Fix trace_id type mismatch (INT4 vs i64) with explicit ::bigint cast
- Change build_face_track_nodes to use from_pg version
- Add skin_tone_trace_nodes to API response
- Add #[derive(Serialize)] to TkgResult
- Fix Unicode panic in text label truncation
- Add push_existing_embeddings.py script
- Add public health routes at /api/v1/health, /api/v1/health/detailed, /api/v1/health/consistency
- Make health functions and response types public
- Public routes bypass auth middleware (unlike protected /api/v1/* routes)
Fix qdrant_request() to properly handle empty dict {} as body.
Python's 'if body' evaluates to False for empty dict, causing EOF error.
Changed:
- data = json.dumps(body).encode() if body is not None else None
Also cleaned up count_seeds() to use consistent body passing.
TKG Helper (scripts/utils/tkg_helper.py):
- mark_face_track_suggested(): Mark node as 'suggested' with pending identity info
- mark_face_track_confirmed(): Mark node as 'confirmed' with identity_ref
- mark_face_track_stranger(): Mark node as 'stranger' with stranger_ref
- batch_mark_suggestions(): Batch mark multiple traces
- batch_mark_strangers(): Batch mark stranger clusters
- get_face_track_nodes(): Get all face_track nodes for a file
- get_pending_face_tracks(): Get nodes with status='pending'
- get_suggested_face_tracks(): Get nodes with status='suggested'
Identity Matcher updates:
- Add --mark-tkg flag to update TKG nodes after matching
- Integrates with tkg_helper for batch operations
Node properties schema:
- status: pending | suggested | confirmed | stranger
- pending_identity_name/uuid/id: suggested identity info
- suggested_by: tmdb | propagation | manual
- confidence: matching score
- identity_ref: confirmed identity reference
- Add ensure_seeds_collection(): create _seeds collection (512D, Cosine)
- Add push_seed_embedding(): push identity seed with payload {identity_id, uuid, name, source, file_uuid, trace_id, tmdb_id}
- Add get_seeds(): get all seeds (optional source filter)
- Add search_seeds(): cosine search against seeds
- Add delete_seed(): delete seed by identity_id
- Add count_seeds(): count seeds (optional source filter)
- Add get_trace_representatives(): get 3 representatives per trace for multi-angle matching
- Add get_trace_centroid(): get centroid embedding for a trace
- Add update_identity_in_faces(): update identity_id/uuid for all face points with trace_id
Point ID strategy: identity_id directly as point_id for _seeds collection
All functions tested successfully
- Add Queued variant to VideoStatus enum
- Trigger sets videos.status='queued' instead of staying 'pending'
- Worker sets videos.status='processing' on pickup
- list_monitor_jobs_by_status ORDER BY created_at ASC (FIFO)
- queue_position counts both 'pending' and 'queued' jobs
- Identity agent: per-face max matching, multi-round with derived
seeds from high-confidence faces, angle diversity filter (cosine sim < 0.90)
- Pending person API: POST /file/:file_uuid/pending-person
+ GET /file/:file_uuid/pending-persons with status=pending, source=manual
- Update API docs (07_identity.md)
- Remove Rule 3 (Scene Chunking) from worker auto-trigger
- Remove rule3_ingest.rs and related imports
- Remove Story/Caption from playground module parsing
- Clean up scan.rs Rule 3 display
- Fix ASRX field name conversion (start_time -> start)
Reason: Story/5W1H/Scene accuracy too poor - will redesign later
- Problem: compact=p=0:nk=1 outputs pipe-delimited format without pts_time=
- Fix: default=nk=0 outputs pts_time=XXX format that parser can match
- Result: Charade scene detection from 1 scene -> 833 scenes (correct)
- pre_chunks: add chunk_type, text_content columns; drop NOT NULL on
coordinate_type/coordinate_index (INSERT statements reference these
columns but CREATE TABLE was missing them)
- run_migrations: add ALTER TABLE for existing databases
- extract_movie_name: filter noise words (youtube, fps, 24fps, 1080p,
pure digits) so 'Charade_YouTube_24fps' → 'Charade'
- run-server-3002.sh: add companion worker startup (matching 3003 script)