feat: add cache toggle, unregister API and chunk training docs
- Add POST /api/v1/config/cache for cache toggle - Add POST /api/v1/unregister for video deletion - Add CHUNK_DATA_STRUCTURE.md for marcom training - Fix processor_results query in delete_video
This commit is contained in:
281
docs/CHUNK_DATA_STRUCTURE.md
Normal file
281
docs/CHUNK_DATA_STRUCTURE.md
Normal file
@@ -0,0 +1,281 @@
|
||||
# Momentry Chunk 資料結構說明
|
||||
|
||||
> **對象**: marcom 團隊
|
||||
> **版本**: V1.0 | **日期**: 2026-03-25
|
||||
|
||||
---
|
||||
|
||||
## 1. 什麼是 Chunk?
|
||||
|
||||
Chunk(片段)是影片處理後的最小單位。當影片上傳後,系統會自動:
|
||||
|
||||
1. **分析** - 偵測場景、人臉、姿態
|
||||
2. **轉換** - 語音轉文字(ASR)
|
||||
3. **分段** - 將內容切割成可搜尋的片段
|
||||
4. **向量化** - 產生可搜尋的特徵向量
|
||||
|
||||
每個 Chunk 就是一個**可獨立搜尋的內容單位**。
|
||||
|
||||
---
|
||||
|
||||
## 2. Chunk 資料結構
|
||||
|
||||
### 2.1 主要欄位
|
||||
|
||||
| 欄位名 | 類型 | 說明 | 範例 |
|
||||
|--------|------|------|------|
|
||||
| `uuid` | 字串 (32) | 影片唯一識別碼 | `952f5854b9febad1` |
|
||||
| `chunk_id` | 字串 (64) | Chunk 唯一識別碼 | `asr_00001` |
|
||||
| `chunk_index` | 整數 | Chunk 順序號碼 | `1` |
|
||||
| `chunk_type` | 字串 (32) | Chunk 類型 | `sentence` |
|
||||
| `start_time` | 浮點數 | 開始時間(秒) | `12.5` |
|
||||
| `end_time` | 浮點數 | 結束時間(秒) | `18.3` |
|
||||
| `content` | JSONB | 詳細內容 | 見下方 |
|
||||
| `vector_id` | 字串 (64) | 向量 ID | `vec_12345` |
|
||||
| `text_content` | 文字 | 純文字內容 | `這是一段話` |
|
||||
| `fps` | 浮點數 | 影片幀率 | `24.0` |
|
||||
| `start_frame` | 整數 | 開始幀數 | `300` |
|
||||
| `end_frame` | 整數 | 結束幀數 | `439` |
|
||||
| `frame_count` | 整數 | 總幀數 | `139` |
|
||||
|
||||
### 2.2 Chunk 類型說明
|
||||
|
||||
| 類型 | ID | 說明 | 來源處理器 |
|
||||
|------|-----|------|-----------|
|
||||
| `sentence` | `sentence` | 語音轉文字片段 | ASR 處理 |
|
||||
| `time` | `time_based` | 固定時間分段 | 系統自動切割 |
|
||||
| `cut` | `cut` | 場景變化片段 | CUT 處理 |
|
||||
| `trace` | `trace` | 軌跡追蹤片段 | YOLO 追蹤處理 |
|
||||
| `story` | `story` | 故事線片段(父子區塊) | Story 分析處理 |
|
||||
|
||||
**父子區塊關係**:
|
||||
- `story` 是**父區塊**,可包含多個 `sentence`、`cut`、`trace` 子區塊
|
||||
- 透過 `parent_chunk_id` 和 `child_chunk_ids` 建立階層關係
|
||||
|
||||
---
|
||||
|
||||
## 3. Content JSON 結構
|
||||
|
||||
每個 Chunk 的 `content` 欄位包含詳細的處理結果:
|
||||
|
||||
### 3.1 ASR Chunk(語音轉文字)
|
||||
|
||||
```json
|
||||
{
|
||||
"text": "今天天氣非常好,我們去郊外踏青吧",
|
||||
"words": [
|
||||
{
|
||||
"word": "今天",
|
||||
"start": 12.5,
|
||||
"end": 12.8,
|
||||
"confidence": 0.95
|
||||
},
|
||||
{
|
||||
"word": "天氣",
|
||||
"start": 12.8,
|
||||
"end": 13.1,
|
||||
"confidence": 0.92
|
||||
}
|
||||
],
|
||||
"language": "zh-TW",
|
||||
"speaker": null
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Cut Chunk(場景偵測)
|
||||
|
||||
```json
|
||||
{
|
||||
"scenes": [
|
||||
{
|
||||
"scene_id": "cut_001",
|
||||
"start_time": 12.5,
|
||||
"end_time": 45.2,
|
||||
"transition": "cut",
|
||||
"confidence": 0.98
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Trace Chunk(軌跡追蹤)
|
||||
|
||||
```json
|
||||
{
|
||||
"track_id": "track_001",
|
||||
"object_class": "person",
|
||||
"frames": [
|
||||
{
|
||||
"frame": 300,
|
||||
"bbox": [120, 80, 200, 300],
|
||||
"confidence": 0.95
|
||||
},
|
||||
{
|
||||
"frame": 301,
|
||||
"bbox": [122, 82, 202, 302],
|
||||
"confidence": 0.94
|
||||
}
|
||||
],
|
||||
"total_frames": 180
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 Story Chunk(故事線)
|
||||
|
||||
```json
|
||||
{
|
||||
"story_id": "story_001",
|
||||
"title": "開場介紹",
|
||||
"summary": "主持人介紹節目主題",
|
||||
"child_chunk_ids": ["sentence_00001", "sentence_00002", "cut_00001"],
|
||||
"tags": ["intro", "host"]
|
||||
}
|
||||
```
|
||||
|
||||
### 3.5 Metadata(其他偵測資訊)
|
||||
|
||||
人臉(Face)、文字辨識(OCR)、姿態(Pose)等偵測結果會附加在 `metadata` 欄位:
|
||||
|
||||
```json
|
||||
{
|
||||
"metadata": {
|
||||
"faces": [
|
||||
{
|
||||
"bbox": [120, 80, 200, 180],
|
||||
"confidence": 0.87,
|
||||
"emotion": "neutral"
|
||||
}
|
||||
],
|
||||
"ocr": {
|
||||
"text": "MOMENTRY",
|
||||
"confidence": 0.96
|
||||
},
|
||||
"pose": {
|
||||
"keypoints": [
|
||||
{"name": "nose", "x": 192, "y": 85, "confidence": 0.95}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. 時間格式說明
|
||||
|
||||
### 4.1 秒數格式(常用)
|
||||
|
||||
```
|
||||
格式: 秒.幀數
|
||||
範例: 1234.60 = 第 1234 秒 + 第 60 幀
|
||||
```
|
||||
|
||||
### 4.2 時間軸格式
|
||||
|
||||
```
|
||||
格式: HH:MM:SS.FF
|
||||
範例: 00:20:34.12 = 20分34秒12幀
|
||||
```
|
||||
|
||||
### 4.3 幀數計算
|
||||
|
||||
```
|
||||
幀數 = 秒數 × fps
|
||||
例如: 12.5秒 × 24fps = 300幀
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. 實際資料範例
|
||||
|
||||
假設有一個影片,包含以下處理結果:
|
||||
|
||||
### 5.1 語音片段
|
||||
|
||||
```json
|
||||
{
|
||||
"uuid": "952f5854b9febad1",
|
||||
"chunk_id": "asr_00001",
|
||||
"chunk_type": "sentence",
|
||||
"start_time": 12.5,
|
||||
"end_time": 18.3,
|
||||
"content": {
|
||||
"text": "今天天氣非常好,我們去郊外踏青吧",
|
||||
"language": "zh-TW"
|
||||
},
|
||||
"text_content": "今天天氣非常好,我們去郊外踏青吧",
|
||||
"start_frame": 300,
|
||||
"end_frame": 439,
|
||||
"fps": 24.0
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 場景片段
|
||||
|
||||
```json
|
||||
{
|
||||
"uuid": "952f5854b9febad1",
|
||||
"chunk_id": "cut_00001",
|
||||
"chunk_type": "cut",
|
||||
"start_time": 45.0,
|
||||
"end_time": 120.5,
|
||||
"content": {
|
||||
"scenes": [{
|
||||
"scene_id": "cut_001",
|
||||
"transition": "cut",
|
||||
"confidence": 0.98
|
||||
}]
|
||||
},
|
||||
"start_frame": 1080,
|
||||
"end_frame": 2892,
|
||||
"fps": 24.0
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. 如何使用 Chunk
|
||||
|
||||
### 6.1 搜尋相關片段
|
||||
|
||||
當使用者搜尋「天氣」時,系統會:
|
||||
|
||||
1. 將「天氣」轉換為向量
|
||||
2. 在向量資料庫中搜尋相似向量
|
||||
3. 找到相關的 Chunk
|
||||
4. 返回時間軸和內容
|
||||
|
||||
### 6.2 播放指定片段
|
||||
|
||||
取得 Chunk 後可播放:
|
||||
|
||||
```
|
||||
開始時間: 12.5 秒
|
||||
結束時間: 18.3 秒
|
||||
```
|
||||
|
||||
### 6.3 組合多個 Chunk
|
||||
|
||||
多個相關 Chunk 可以組合成一個章節或故事線。
|
||||
|
||||
---
|
||||
|
||||
## 7. 快速參考
|
||||
|
||||
| 項目 | 說明 |
|
||||
|------|------|
|
||||
| UUID | 影片唯一識別碼(16位 hex) |
|
||||
| Chunk ID | 片段識別碼(如 `sentence_00001`) |
|
||||
| chunk_type | 片段類型(sentence/time/cut/trace/story) |
|
||||
| start_time | 開始時間(秒) |
|
||||
| end_time | 結束時間(秒) |
|
||||
| text_content | 純文字內容 |
|
||||
| content | 詳細 JSON 結構 |
|
||||
| metadata | 人臉、OCR、姿態等偵測結果 |
|
||||
| parent_chunk_id | 父區塊 ID(用於 story 區塊) |
|
||||
| child_chunk_ids | 子區塊 ID 列表(story 區塊專用) |
|
||||
|
||||
---
|
||||
|
||||
**文件版本**: V1.0
|
||||
**最後更新**: 2026-03-25
|
||||
@@ -148,6 +148,30 @@ struct SearchRequest {
|
||||
uuid: Option<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct CacheToggleRequest {
|
||||
enabled: bool,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
struct CacheToggleResponse {
|
||||
success: bool,
|
||||
cache_enabled: bool,
|
||||
message: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct UnregisterRequest {
|
||||
uuid: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize)]
|
||||
struct UnregisterResponse {
|
||||
success: bool,
|
||||
uuid: String,
|
||||
message: String,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
struct SearchResult {
|
||||
uuid: String,
|
||||
@@ -1300,6 +1324,56 @@ async fn get_job(
|
||||
Ok(Json(response))
|
||||
}
|
||||
|
||||
async fn cache_toggle(
|
||||
State(_state): State<AppState>,
|
||||
Json(req): Json<CacheToggleRequest>,
|
||||
) -> Result<Json<CacheToggleResponse>, StatusCode> {
|
||||
tracing::info!("[cache_toggle] Setting cache enabled to: {}", req.enabled);
|
||||
|
||||
crate::core::config::set_cache_enabled(req.enabled);
|
||||
|
||||
let response = CacheToggleResponse {
|
||||
success: true,
|
||||
cache_enabled: req.enabled,
|
||||
message: if req.enabled {
|
||||
"Cache enabled".to_string()
|
||||
} else {
|
||||
"Cache disabled".to_string()
|
||||
},
|
||||
};
|
||||
|
||||
tracing::info!("[cache_toggle] SUCCESS");
|
||||
Ok(Json(response))
|
||||
}
|
||||
|
||||
async fn unregister(
|
||||
State(state): State<AppState>,
|
||||
Json(req): Json<UnregisterRequest>,
|
||||
) -> Result<Json<UnregisterResponse>, StatusCode> {
|
||||
tracing::info!("[unregister] Unregistering video: {}", req.uuid);
|
||||
|
||||
let db = &state.api_state.db;
|
||||
|
||||
match db.delete_video(&req.uuid).await {
|
||||
Ok(_) => {
|
||||
tracing::info!("[unregister] SUCCESS - deleted: {}", req.uuid);
|
||||
Ok(Json(UnregisterResponse {
|
||||
success: true,
|
||||
uuid: req.uuid,
|
||||
message: "Video unregistered successfully".to_string(),
|
||||
}))
|
||||
}
|
||||
Err(e) => {
|
||||
tracing::error!("[unregister] ERROR - {}", e);
|
||||
Ok(Json(UnregisterResponse {
|
||||
success: false,
|
||||
uuid: req.uuid,
|
||||
message: format!("Failed to unregister: {}", e),
|
||||
}))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn start_server(host: &str, port: u16) -> anyhow::Result<()> {
|
||||
let _ = SERVER_START.set(Instant::now());
|
||||
|
||||
@@ -1319,6 +1393,7 @@ pub async fn start_server(host: &str, port: u16) -> anyhow::Result<()> {
|
||||
|
||||
let protected_routes = Router::new()
|
||||
.route("/api/v1/register", post(register))
|
||||
.route("/api/v1/unregister", post(unregister))
|
||||
.route("/api/v1/probe", post(probe))
|
||||
.route("/api/v1/search", post(search))
|
||||
.route("/api/v1/n8n/search", post(n8n_search))
|
||||
@@ -1328,6 +1403,7 @@ pub async fn start_server(host: &str, port: u16) -> anyhow::Result<()> {
|
||||
.route("/api/v1/progress/:uuid", get(get_progress))
|
||||
.route("/api/v1/jobs", get(list_jobs))
|
||||
.route("/api/v1/jobs/:uuid", get(get_job))
|
||||
.route("/api/v1/config/cache", post(cache_toggle))
|
||||
.layer(axum::middleware::from_fn_with_state(
|
||||
state.api_state.clone(),
|
||||
api_key_validation,
|
||||
|
||||
@@ -1,5 +1,23 @@
|
||||
use once_cell::sync::Lazy;
|
||||
use std::env;
|
||||
use std::sync::RwLock;
|
||||
|
||||
pub static RUNTIME_CACHE_ENABLED: Lazy<RwLock<bool>> = Lazy::new(|| {
|
||||
let initial = env::var("MONGODB_CACHE_ENABLED")
|
||||
.unwrap_or_else(|_| "true".to_string())
|
||||
.parse()
|
||||
.unwrap_or(true);
|
||||
RwLock::new(initial)
|
||||
});
|
||||
|
||||
pub fn get_cache_enabled() -> bool {
|
||||
*RUNTIME_CACHE_ENABLED.read().unwrap()
|
||||
}
|
||||
|
||||
pub fn set_cache_enabled(enabled: bool) {
|
||||
*RUNTIME_CACHE_ENABLED.write().unwrap() = enabled;
|
||||
tracing::info!("Cache enabled set to: {}", enabled);
|
||||
}
|
||||
|
||||
pub static DATABASE_URL: Lazy<String> = Lazy::new(|| {
|
||||
env::var("DATABASE_URL")
|
||||
|
||||
@@ -584,6 +584,40 @@ impl PostgresDb {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub async fn delete_video(&self, uuid: &str) -> Result<()> {
|
||||
tracing::info!("[PostgresDb] Deleting video: {}", uuid);
|
||||
|
||||
let tx = self.pool.begin().await?;
|
||||
|
||||
sqlx::query("DELETE FROM chunk_vectors WHERE uuid = $1")
|
||||
.bind(uuid)
|
||||
.execute(&self.pool)
|
||||
.await?;
|
||||
|
||||
sqlx::query("DELETE FROM chunks WHERE uuid = $1")
|
||||
.bind(uuid)
|
||||
.execute(&self.pool)
|
||||
.await?;
|
||||
|
||||
sqlx::query("DELETE FROM processor_results WHERE video_id IN (SELECT id FROM videos WHERE uuid = $1)")
|
||||
.bind(uuid)
|
||||
.execute(&self.pool)
|
||||
.await?;
|
||||
|
||||
sqlx::query("DELETE FROM videos WHERE uuid = $1")
|
||||
.bind(uuid)
|
||||
.execute(&self.pool)
|
||||
.await?;
|
||||
|
||||
tx.commit().await?;
|
||||
|
||||
let mut cache = self.cache.write().await;
|
||||
cache.videos.remove(uuid);
|
||||
|
||||
tracing::info!("[PostgresDb] Video deleted: {}", uuid);
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub async fn get_storage_status(&self, uuid: &str) -> Result<Option<StorageStatus>> {
|
||||
if let Some(video) = self.get_video_by_uuid(uuid).await? {
|
||||
Ok(Some(video.storage))
|
||||
|
||||
Reference in New Issue
Block a user