M4 handover: coordinate fixes, detector registry, deploy v2, YOLOv8s, identity lifecycle
- Fix swift_pose/swift_ocr Y-flip bugs (BUG-003~006) - Add heuristic_scene module + post-processing trigger (replaces Places365) - YOLOv5nu → YOLOv8s CoreML (+33% detections, +390% scene indicators) - Per-table SQL export (split 4.7GB single file → 478MB max per table) - Version/build check in deploy.sh (compare /health vs file_info.json) - Add file_uuid column to identities table + backfill - Identity pre-clean step in deploy (avoids UNIQUE conflicts on re-deploy) - Stranger_xxx naming fix with UUID context - Add DETECTOR_REGISTRY.md (25 detectors), DETECTOR_SELECTION_SOP.md - Update SPATIAL_COORDINATE_REGISTRY.md (P layer, 6-layer architecture) - New IDENTITY_LIFECYCLE.md - M4 response docs for deploy_script_fix and 111614 test report
This commit is contained in:
@@ -0,0 +1,161 @@
|
||||
# Identity 生命週期 — 轉移前 → 內容包 → 轉移後
|
||||
|
||||
**Date**: 2026-05-13
|
||||
**Ref**: `dev.identities` table, `file_uuid` column
|
||||
|
||||
---
|
||||
|
||||
## 三階段架構
|
||||
|
||||
```
|
||||
轉移前(Source DB) 內容包(.tar.gz) 轉移後(Target DB)
|
||||
──────────────────── ────────────────── ────────────────────
|
||||
dev.identities sql/dev_identities.sql dev.identities
|
||||
├── PERSON_UUID_cluster → WHERE file_uuid = '{u}' → INSERT/COPY
|
||||
├── Stranger_FILE_cluster → (同上) → (同上)
|
||||
├── tmdb (global) → WHERE file_uuid IS NULL → UPDATE (merge)
|
||||
│ AND source IN ('tmdb',..)
|
||||
├── merged (global) → (同上) → (同上)
|
||||
├── auto inactive → ❌ 不匯出 → (不存在)
|
||||
├── Stranger_original → ❌ 已被改名 → (不存在舊名)
|
||||
└── user_defined (global) → (同上) → (同上)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 階段 1:轉移前(Source Database)
|
||||
|
||||
### 資料分類
|
||||
|
||||
| Category | 筆數 | file_uuid | 來源 | 用途 |
|
||||
|----------|:---:|:---------:|------|------|
|
||||
| `PERSON_{UUID8}_{cluster}` | ~428/檔案 | 設定 | identity_bind.py | 自動聚類 identity,每個檔案獨立命名 |
|
||||
| `Stranger_{UUID8}_{counter}` | ~25/檔案 | 設定 | experiment runner | 單筆 trace 臨時 identity |
|
||||
| `tmdb` | ~15 (全局) | NULL | tmdb_identity_integration | 全局 TMDB 演員 identity |
|
||||
| `auto` inactive | ~3051 (全局) | NULL | identity_bind.py (被取代) | 被 TMDB 覆蓋的舊 auto identity,不匯出 |
|
||||
| `merged` | ~11 | NULL | match_identities_to_tmdb.py | 已與 TMDB 合併的 auto identity |
|
||||
| `user_defined` | — | NULL | 使用者手動建立 | 保留 |
|
||||
|
||||
### 衝突預防機制
|
||||
|
||||
```
|
||||
命名規則:
|
||||
PERSON_{file_uuid[:8]}_{cluster_id}
|
||||
Stranger_{file_uuid[:8]}_{counter}
|
||||
|
||||
→ 不同檔案的 identity 不會撞名
|
||||
→ UNIQUE (name) constraint 安全
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 階段 2:內容包內(Package)
|
||||
|
||||
### 匯出查詢
|
||||
|
||||
```sql
|
||||
COPY (
|
||||
SELECT * FROM dev.identities
|
||||
WHERE file_uuid = '{uuid}' -- 此檔案的 identity
|
||||
OR (file_uuid IS NULL AND source IN
|
||||
('tmdb', 'merged', 'user_defined')) -- 全局 global identity
|
||||
) TO STDOUT WITH CSV HEADER
|
||||
```
|
||||
|
||||
### 包內身份清單範例
|
||||
|
||||
| name | source | file_uuid | 屬於 |
|
||||
|------|--------|-----------|------|
|
||||
| PERSON_aeed7134_11 | auto | aeed7134... | ✅ 此檔案 |
|
||||
| PERSON_aeed7134_18 | auto | aeed7134... | ✅ 此檔案 |
|
||||
| Cary Grant | tmdb | NULL | 🌐 全局 |
|
||||
| Audrey Hepburn | tmdb | NULL | 🌐 全局 |
|
||||
| Paul Bonifas (merged) | merged | NULL | 🌐 全局 |
|
||||
| Stranger_417a7e93_001 | auto_temp | 417a7e93... | ✅ 此檔案 |
|
||||
| (PERSON_417a7e93_xxx) | auto | 417a7e93... | ❌ 不匯出(非此檔案) |
|
||||
|
||||
---
|
||||
|
||||
## 階段 3:轉移後(Target Database)
|
||||
|
||||
### 匯入流程
|
||||
|
||||
```
|
||||
接收包 → bash deploy.sh
|
||||
│
|
||||
├─ cd "$DIR" && psql -f data.sql
|
||||
│ ├─ \i sql/dev_videos.sql (單筆,INSERT)
|
||||
│ ├─ \i sql/dev_chunk.sql (批次,COPY)
|
||||
│ ├─ \i sql/dev_face_detections.sql (批次,COPY)
|
||||
│ ├─ \i sql/dev_identities.sql → HERE
|
||||
│ ├─ \i sql/dev_identity_bindings.sql
|
||||
│ ├─ \i sql/dev_tkg_nodes.sql
|
||||
│ └─ \i sql/dev_tkg_edges.sql
|
||||
```
|
||||
|
||||
### COPY 面臨的問題
|
||||
|
||||
`COPY` 指令沒有 `ON CONFLICT` 機制。若目標 DB 已有同名 identity,COPY 會因 `UNIQUE(name)` 而失敗。
|
||||
|
||||
| 情境 | 風險 | 處理方式 |
|
||||
|------|:--:|---------|
|
||||
| target 無此檔案 → 新 deploy | ✅ 正常 | COPY 順利 |
|
||||
| target 已有此檔案 → 重新 deploy | ⚠️ `PERSON_xxx` 已存在 | COPY 失敗 |
|
||||
| target 已有其他檔案 deploy 過 | ⚠️ TMDB identity(如 Cary Grant)已存在 | COPY 失敗 |
|
||||
| 兩個包同時含有相同 TMDB 演員 | ⚠️ 同名 global identity | COPY 失敗 |
|
||||
|
||||
### 解法
|
||||
|
||||
`deploy.sh` 的資料匯入需要使用 `psql` 的 ON CONFLICT 處理,而非直接 COPY。
|
||||
|
||||
**方案 A:COPY 前先 DELETE 同名 identity**
|
||||
|
||||
```sql
|
||||
DELETE FROM dev.identities WHERE file_uuid = '{uuid}';
|
||||
COPY dev.identities FROM STDIN WITH CSV HEADER;
|
||||
```
|
||||
|
||||
但這個方案會誤刪 global identity(因 TMDB identity 的 file_uuid IS NULL,WHERE file_uuid = '{uuid}' 不會刪到 global identity)。
|
||||
|
||||
**方案 B:使用 `\COPY` + 暫存表**
|
||||
|
||||
```sql
|
||||
CREATE TEMP TABLE tmp_identities (LIKE dev.identities);
|
||||
\copy tmp_identities FROM 'identities.csv' WITH CSV HEADER;
|
||||
|
||||
INSERT INTO dev.identities AS t
|
||||
SELECT * FROM tmp_identities i
|
||||
ON CONFLICT (name) DO UPDATE
|
||||
SET file_uuid = COALESCE(t.file_uuid, EXCLUDED.file_uuid),
|
||||
source = EXCLUDED.source,
|
||||
face_embedding = COALESCE(EXCLUDED.face_embedding, t.face_embedding),
|
||||
tmdb_id = COALESCE(EXCLUDED.tmdb_id, t.tmdb_id);
|
||||
```
|
||||
|
||||
**方案 C:deploy.sh 中包一層 ON CONFLICT 邏輯**
|
||||
|
||||
```bash
|
||||
# 對 identity_bindings 等小 table 直接用 COPY(已有 ON CONFLICT 容忍)
|
||||
# 對 identities 用 ON CONFLICT 的 INSERT
|
||||
for f in "$DIR"/sql/dev_identities.sql; do
|
||||
"$PG_BIN/psql" -U "$DB_USER" -d "$DB_NAME" <<-EOSQL
|
||||
BEGIN;
|
||||
-- Use temporary table for ON CONFLICT handling
|
||||
DELETE FROM dev.identities WHERE file_uuid = '$UUID';
|
||||
\i $f
|
||||
COMMIT;
|
||||
EOSQL
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 結論
|
||||
|
||||
| 層面 | 現狀 | 風險 | 建議 |
|
||||
|------|------|:--:|------|
|
||||
| 命名衝突 | `PERSON_UUID_cluster` 防撞 | ✅ 安全 | — |
|
||||
| TMDB 重複匯入 | COPY 無 ON CONFLICT | ❌ 會失敗 | 方案 C:DELETE WHERE file_uuid='{UUID}' 再 COPY |
|
||||
| 跨檔案合併 | global identity 透過 `file_uuid IS NULL` 區分 | ⚠️ 需確認 | 目前 TMDB identity 已無 file_uuid |
|
||||
| 舊環境覆蓋 | 重新 deploy 會撞名 | ❌ 會失敗 | 同上方案 C |
|
||||
| Stranger 命名修正 | 已補 `{UUID8}` | ✅ 安全 | — |
|
||||
Reference in New Issue
Block a user