Distributed storage research: Ceph (shelved) + MinIO guide + DedupS3 design
This commit is contained in:
328
docs/CEPH_INTEGRATION_ANALYSIS.md
Normal file
328
docs/CEPH_INTEGRATION_ANALYSIS.md
Normal file
@@ -0,0 +1,328 @@
|
||||
# Ceph RADOS Integration Analysis for MarkBase
|
||||
|
||||
**Date**: 2026-06-25
|
||||
**Status**: Shelved (不符合 macOS 跨平台定位)
|
||||
**Library**: ceph-async (4.0.5)
|
||||
**Constraint**: Linux-only (requires librados.so symlink)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
### Goal
|
||||
Add Ceph RADOS as a VfsBackend option for distributed, highly scalable storage.
|
||||
|
||||
### Key Findings
|
||||
| Aspect | Finding |
|
||||
|--------|---------|
|
||||
| **Platform** | ❌ Linux-only (librados.so FFI, macOS needs Docker/VM) |
|
||||
| **Deployment** | ⚠️ Requires full cluster (Monitor + OSD + MGR) |
|
||||
| **Complexity** | ⚠️⚠️⚠️⚠️⚠️ High (超出 Lightweight 定位) |
|
||||
| **Positioning** | ❌ 不符合 MarkBase macOS 跨平台定位 |
|
||||
|
||||
### Recommendation
|
||||
**当前搁置**。优先考虑:
|
||||
1. **MinIO** — S3-compatible,已有 S3Vfs 支持,跨平台
|
||||
2. **内置分布式** — DedupFs + S3Vfs 组合,轻量级
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ MarkBase Application Layer │
|
||||
│ ├── SMB Server (Port 4445) │
|
||||
│ ├── SFTP Server (Port 2024) │
|
||||
│ ├── WebDAV Server (Port 11438) │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘
|
||||
│ ↓ │
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ VFS Abstraction Layer (VfsBackend trait) │
|
||||
│ ├── LocalFs — POSIX local filesystem │
|
||||
│ ├── S3Vfs — S3-compatible storage (HTTP API) │
|
||||
│ ├── SmbVfs — SMB client backend │
|
||||
│ ├── CephVfs — Ceph RADOS backend (搁置) │
|
||||
│ ├── EncryptedFs — Encryption layer │
|
||||
│ ├── Compression — ZSTD/LZ4 compression layer │
|
||||
│ ├── DedupFs — Block deduplication layer │
|
||||
│ ├── RaidFs — RAID-Z emulation layer │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘
|
||||
│ ↓ │
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ Ceph Storage Cluster (RADOS) │
|
||||
│ ├── Monitor (MON) — Cluster map, authentication │
|
||||
│ ├── OSD Daemons — Object storage (data replication) │
|
||||
│ ├── Manager (MGR) — Dashboard, telemetry │
|
||||
│ ├── MDS (optional) — CephFS metadata server │
|
||||
│ ├── RGW (optional) — S3/Swift gateway │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Library Analysis
|
||||
|
||||
### Rust Ceph Crates
|
||||
|
||||
| Crate | Version | Description | Platform |
|
||||
|-------|---------|-------------|----------|
|
||||
| `ceph` | 3.2.5 | Official librados FFI (sync) | Linux-only |
|
||||
| `ceph-async` | 4.0.5 | Async librados FFI (futures 0.3) | Linux-only |
|
||||
| `ceph-rbd` | 0.3.2 | RADOS Block Device bindings | Linux-only |
|
||||
|
||||
### ceph-async Module Structure
|
||||
|
||||
```
|
||||
ceph_async::
|
||||
├── CephClient — Admin operations (OSD/Pool/Mon commands)
|
||||
├── rados:: — Low-level FFI bindings (100+ functions)
|
||||
│ ├── rados_read/write/stat/remove — Object I/O
|
||||
│ ├── rados_pool_create/delete/lookup — Pool management
|
||||
│ ├── rados_ioctx_* — I/O context (pool handle)
|
||||
│ ├── rados_snap_* — Snapshot management
|
||||
│ ├── rados_lock_* — Distributed locking
|
||||
│ ├── rados_aio_* — Async I/O
|
||||
│ ├── rados_omap_* — Key-value store per object
|
||||
│ └── rados_write_op_* / rados_read_op_* — Compound operations
|
||||
├── completion:: — Async completion handling
|
||||
├── read_stream:: — Async read stream
|
||||
├── write_sink:: — Async write sink
|
||||
└── list_stream:: — Async object listing
|
||||
```
|
||||
|
||||
### CephClient API
|
||||
|
||||
```rust
|
||||
let client = CephClient::new("admin", "/etc/ceph/ceph.conf")?;
|
||||
|
||||
// OSD operations
|
||||
client.osd_tree()?; // Get OSD tree (CRUSH map)
|
||||
client.osd_out(osd_id)?; // Mark OSD out
|
||||
client.osd_crush_remove(osd_id)?; // Remove from CRUSH map
|
||||
|
||||
// Pool operations
|
||||
client.osd_pool_get(pool, option)?; // Get pool config
|
||||
client.osd_pool_set(pool, key, val)?; // Set pool config
|
||||
client.osd_pool_quota_get(pool)?; // Get pool quota
|
||||
|
||||
// Cluster status
|
||||
client.status()?; // Cluster health
|
||||
client.mon_dump()?; // Monitor list
|
||||
client.version()?; // Ceph version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
| Phase | Task | Code Lines | Priority | Risk | Dependencies |
|
||||
|-------|------|------------|----------|------|--------------|
|
||||
| **Phase 1** | CephVfs struct + basic I/O | ~400 | P0 | Medium ⚠️⚠️⚠️ | ceph-async crate |
|
||||
| **Phase 2** | Pool management CLI | ~150 | P1 | Low ⚠️ | Phase 1 |
|
||||
| **Phase 3** | Snapshot support | ~200 | P2 | Medium ⚠️⚠️⚠️ | librados snap API |
|
||||
| **Phase 4** | Distributed locking | ~100 | P2 | Medium ⚠️⚠️⚠️ | librados lock API |
|
||||
| **Phase 5** | OMAP key-value | ~150 | P3 | Low ⚠️ | librados omap API |
|
||||
| **Phase 6** | Async integration | ~300 | P1 | High ⚠️⚠️⚠️⚠️ | async-vfs feature |
|
||||
| **Phase 7** | Docker test environment | ~50 | P0 | Low ⚠️ | Docker compose |
|
||||
| **Phase 8** | Performance benchmark | ~100 | P2 | Low ⚠️ | Benchmark scripts |
|
||||
| **Total** | | **~1350** | | | |
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: CephVfs Core Implementation
|
||||
|
||||
### Key Design Decisions
|
||||
|
||||
**1. Object vs File mapping**:
|
||||
- RADOS is object storage (no directories)
|
||||
- Path `/foo/bar.txt` → Object `foo/bar.txt` in pool
|
||||
- Directories simulated via zero-byte objects with `/` suffix (like S3)
|
||||
|
||||
**2. Pool-per-share vs single pool**:
|
||||
- Option A: Single pool + path prefix (simpler, less isolation)
|
||||
- Option B: Pool-per-share (better isolation, quota per pool)
|
||||
- **Recommend**: Option B (pool-per-share) for enterprise use
|
||||
|
||||
**3. I/O context caching**:
|
||||
- Each pool requires separate `rados_ioctx_t`
|
||||
- Cache ioctx per share to avoid recreation overhead
|
||||
|
||||
### CephVfs Struct (Draft)
|
||||
|
||||
```rust
|
||||
pub struct CephVfs {
|
||||
cluster: rados_t, // RADOS cluster handle
|
||||
pool_name: String, // Pool name for this share
|
||||
ioctx: rados_ioctx_t, // I/O context (cached)
|
||||
root_prefix: String, // Path prefix within pool
|
||||
}
|
||||
|
||||
pub struct CephVfsFile {
|
||||
ioctx: rados_ioctx_t,
|
||||
object_id: String, // Object name in pool
|
||||
position: u64,
|
||||
write_buffer: Vec<u8>, // Buffer for writes (flush on close)
|
||||
size: u64,
|
||||
}
|
||||
```
|
||||
|
||||
### VfsBackend Method Mapping
|
||||
|
||||
| Method | RADOS equivalent | Complexity |
|
||||
|--------|-----------------|------------|
|
||||
| `read_dir()` | `rados_nobjects_list_*` | High (pagination) |
|
||||
| `open_file()` | Custom (object ops) | Medium |
|
||||
| `stat()` | `rados_stat()` | Low |
|
||||
| `create_dir()` | `rados_write_full(0-byte)` | Low |
|
||||
| `remove_dir()` | `rados_remove()` | Low |
|
||||
| `remove_file()` | `rados_remove()` | Low |
|
||||
| `rename()` | Custom (copy + delete) | Medium |
|
||||
| `exists()` | `rados_stat()` | Low |
|
||||
| `copy()` | `rados_clone_range()` | Low |
|
||||
| `hard_link()` | `rados_clone_range()` | Low |
|
||||
| `read_link()` | Unsupported | N/A |
|
||||
| `create_symlink()` | Unsupported | N/A |
|
||||
|
||||
---
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
| Risk | Level | Mitigation |
|
||||
|------|-------|------------|
|
||||
| **Linux-only** | ⚠️⚠️⚠️⚠️⚠️ Critical | Docker/VM for macOS; 不符合跨平台定位 |
|
||||
| **librados.so symlink** | ⚠️⚠️⚠️ Medium | Document setup; CI check |
|
||||
| **Pool-level snapshots** | ⚠️⚠️ Low | Document limitation; consider RGW |
|
||||
| **Async overhead** | ⚠️⚠️⚠️ Medium | Benchmark; spawn_blocking wrapper |
|
||||
| **Cluster complexity** | ⚠️⚠️⚠️⚠️⚠️ Critical | 超出 Lightweight 定位; Docker compose |
|
||||
| **SMB Oplocks integration** | ⚠️⚠️⚠️ Medium | RADOS locking API; careful design |
|
||||
|
||||
---
|
||||
|
||||
## Alternatives (推荐方案)
|
||||
|
||||
### 方案对比
|
||||
|
||||
| 方案 | 跨平台 | 部署复杂度 | 定位匹配 | 状态 |
|
||||
|------|--------|-----------|---------|------|
|
||||
| **Ceph RADOS** | ❌ Linux-only | ⚠️⚠️⚠️⚠️⚠️ 极高 | ❌ 不匹配 | 搁置 |
|
||||
| **Ceph RGW (S3)** | ✅ HTTP API | ⚠️⚠️⚠️⚠️ 高 | ⭐⭐⭐ 中等 | 已有 S3Vfs |
|
||||
| **MinIO** | ✅ 全平台 | ⚠️⚠️ 低 | ⭐⭐⭐⭐⭐ 完全匹配 | 已有 S3Vfs |
|
||||
| **GlusterFS** | ✅ POSIX | ⚠️⚠️⚠️ 中 | ⭐⭐⭐⭐ 高 | 待研究 |
|
||||
| **内置分布式** | ✅ 全平台 | ⚠️⚠️ 低 | ⭐⭐⭐⭐⭐ 完全匹配 | 已有基础 |
|
||||
|
||||
### 方案 1: MinIO (推荐)
|
||||
|
||||
**优势**:
|
||||
- ✅ S3-compatible API(已有 S3Vfs,无需新代码)
|
||||
- ✅ 单节点部署(轻量级)
|
||||
- ✅ 跨平台(macOS/Linux/Windows)
|
||||
- ✅ 高性能(纠删码)
|
||||
- ✅ 开源 + 企业版
|
||||
|
||||
**部署**:
|
||||
```bash
|
||||
# macOS 单节点
|
||||
minio server /data --console-address ":9001"
|
||||
|
||||
# MarkBase 配置
|
||||
MB_S3_ENDPOINT=http://localhost:9000
|
||||
MB_S3_BUCKET=markbase
|
||||
```
|
||||
|
||||
**集成**: 无需修改代码,S3Vfs 已支持。
|
||||
|
||||
---
|
||||
|
||||
### 方案 2: 内置分布式存储
|
||||
|
||||
**已有基础**:
|
||||
| 功能 | 文件 | 分布式潜力 |
|
||||
|------|------|-----------|
|
||||
| DedupFs | dedup.rs | ✅ SHA-256 块存储可跨节点共享 |
|
||||
| RaidFs | raid.rs | ⚠️ 单节点 RAID-Z |
|
||||
| Send-Receive | send_receive.rs | ⚠️ 类似 ZFS send/receive |
|
||||
| Checksum | checksum.rs | ✅ 数据完整性验证 |
|
||||
| Compression | compression.rs | ✅ ZSTD 压缩 |
|
||||
|
||||
**扩展方向**:
|
||||
1. DedupFs + S3Vfs: Dedup 块存储到 MinIO/S3(跨节点共享)
|
||||
2. Checksum + Replication: 增加跨节点复制
|
||||
3. Send-Receive + Remote: 增加远程 replication
|
||||
|
||||
---
|
||||
|
||||
## Technical Details
|
||||
|
||||
### librados API Functions
|
||||
|
||||
**Object I/O**:
|
||||
- `rados_read(ioctx, oid, buf, len, offset)` — Read at offset
|
||||
- `rados_write(ioctx, oid, buf, len, offset)` — Write at offset
|
||||
- `rados_write_full(ioctx, oid, buf, len)` — Write entire object
|
||||
- `rados_append(ioctx, oid, buf, len)` — Append to object
|
||||
- `rados_stat(ioctx, oid, psize, pmtime)` — Get object size/mtime
|
||||
- `rados_remove(ioctx, oid)` — Delete object
|
||||
|
||||
**Pool Operations**:
|
||||
- `rados_pool_create(cluster, pool_name)` — Create pool
|
||||
- `rados_pool_delete(cluster, pool_name)` — Delete pool
|
||||
- `rados_pool_lookup(cluster, pool_name)` — Find pool ID
|
||||
- `rados_ioctx_create(cluster, pool_name, ioctx)` — Create I/O context
|
||||
|
||||
**Snapshots**:
|
||||
- `rados_ioctx_snap_create(ioctx, snap_name)` — Create pool snapshot
|
||||
- `rados_ioctx_snap_list(ioctx, snaps)` — List snapshots
|
||||
- `rados_ioctx_snap_remove(ioctx, snap_id)` — Delete snapshot
|
||||
- `rados_ioctx_snap_rollback(ioctx, oid, snap_id)` — Rollback object
|
||||
|
||||
**Locking**:
|
||||
- `rados_lock_exclusive(ioctx, oid, name, cookie, desc, duration, flags)` — Exclusive lock
|
||||
- `rados_lock_shared(ioctx, oid, name, cookie, tag, desc, duration, flags)` — Shared lock
|
||||
- `rados_unlock(ioctx, oid, name, cookie)` — Release lock
|
||||
- `rados_list_lockers(ioctx, oid, name, ...)` — List lock holders
|
||||
|
||||
**OMAP (Key-Value)**:
|
||||
- `rados_omap_set(ioctx, oid, map)` — Set key-value pairs
|
||||
- `rados_omap_get(ioctx, oid, ...)` — Get values by keys
|
||||
- `rados_omap_get_keys(ioctx, oid, ...)` — List keys
|
||||
- `rados_omap_rm_keys(ioctx, oid, keys)` — Delete keys
|
||||
|
||||
**Async I/O**:
|
||||
- `rados_aio_read(ioctx, oid, completion, buf, len, offset)` — Async read
|
||||
- `rados_aio_write(ioctx, oid, completion, buf, len, offset)` — Async write
|
||||
- `rados_aio_flush(ioctx)` — Flush pending async ops
|
||||
- `rados_aio_wait_for_complete(completion)` — Wait for completion
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **部署目标**: Linux-only production vs macOS development?
|
||||
2. **Backend choice**: RADOS (librados) vs RGW (S3 API)?
|
||||
3. **Pool strategy**: Pool-per-share vs single pool + path prefix?
|
||||
4. **SMB Oplocks**: Should CephVfs support SMB Oplocks via RADOS locking?
|
||||
5. **Priority**: Start with basic I/O or full async integration first?
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
**当前搁置 Ceph RADOS 集成**,原因:
|
||||
1. ❌ Linux-only 约束不符合 macOS 跨平台定位
|
||||
2. ⚠️ 部署复杂度超出 Lightweight 定位
|
||||
3. ⚠️ 需要完整 Ceph 集群(Monitor + OSD + MGR)
|
||||
|
||||
**推荐替代方案**:
|
||||
1. ⭐⭐⭐⭐⭐ **MinIO** — S3-compatible,已有 S3Vfs,轻量级
|
||||
2. ⭐⭐⭐⭐⭐ **内置分布式** — DedupFs + S3Vfs 组合
|
||||
|
||||
**后续行动**:
|
||||
- MinIO 集成文档(0 行代码)
|
||||
- DedupFs + S3Vfs 组合研究(~100 行)
|
||||
- 内置 Replication 功能(~400 行)
|
||||
|
||||
---
|
||||
|
||||
**文档创建**: 2026-06-25
|
||||
**最后更新**: 2026-06-25
|
||||
563
docs/DEDUP_S3_COMBINATION.md
Normal file
563
docs/DEDUP_S3_COMBINATION.md
Normal file
@@ -0,0 +1,563 @@
|
||||
# DedupFs + S3Vfs Combination Design
|
||||
|
||||
**Date**: 2026-06-25
|
||||
**Status**: Design proposal
|
||||
**Goal**: Distributed deduplication storage via MinIO/S3 backend
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
### Current State
|
||||
|
||||
**DedupStore**(`dedup.rs`, 224 行):
|
||||
- 基于**本地文件系统**的 dedup 存储
|
||||
- SHA-256 块哈希 + 引用计数
|
||||
- 块存储到本地目录(`store_path/.dedup/`)
|
||||
|
||||
**问题**:
|
||||
- ❌ 无法跨节点共享 dedup 块
|
||||
- ❌ 无分布式容错能力
|
||||
- ❌ 单节点存储限制
|
||||
|
||||
### Proposed Solution
|
||||
|
||||
**DedupS3Store**:
|
||||
- 块存储到 **MinIO/S3** 对象(跨节点共享)
|
||||
- 引用计数存储到 S3 object metadata
|
||||
- Manifest 存储到 S3 对象(JSON 格式)
|
||||
|
||||
**优势**:
|
||||
- ✅ 跨节点 dedup 共享(MinIO 分布式)
|
||||
- ✅ 自动容错(MinIO erasure coding)
|
||||
- ✅ 无单节点限制(MinIO 可扩展)
|
||||
- ✅ 与现有 S3Vfs 集成(无需新 HTTP API)
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ MarkBase Node A │
|
||||
│ ├── DedupS3Store │
|
||||
│ │ ├── store_block() → S3 PUT <hash> │
|
||||
│ │ ├── get_block() → S3 GET <hash> │
|
||||
│ │ └── dedup_file() → 分块 + S3 PUT + manifest │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘
|
||||
│ ↓ │
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ MinIO Cluster (S3-compatible) │
|
||||
│ ├── Bucket: markbase-dedup │
|
||||
│ │ ├── Objects: <sha256-hash> (dedup 块) │
|
||||
│ │ ├── Metadata: x-amz-meta-ref-count (引用计数) │
|
||||
│ │ └── Manifests: manifests/<file-id>.json │
|
||||
│ │ │
|
||||
│ ├── Erasure Coding: EC:2 (自动容错) │
|
||||
│ ├── Replication: Node A → Node B (DR) │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘
|
||||
│ ↓ │
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ MarkBase Node B │
|
||||
│ ├── DedupS3Store │
|
||||
│ │ ├── get_block() → S3 GET <hash> (共享 Node A 的块) │
|
||||
│ │ └── restore_file() → S3 GET manifest + S3 GET blocks │
|
||||
│ └─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Design
|
||||
|
||||
### DedupS3Store Struct
|
||||
|
||||
```rust
|
||||
pub struct DedupS3Store {
|
||||
s3vfs: S3Vfs, // S3 backend
|
||||
bucket: String, // Bucket name (markbase-dedup)
|
||||
block_prefix: String, // Object key prefix (blocks/)
|
||||
manifest_prefix: String, // Manifest prefix (manifests/)
|
||||
config: VfsDedupConfig, // block_size, min_file_size
|
||||
}
|
||||
|
||||
pub struct DedupManifest {
|
||||
original_size: usize,
|
||||
block_hashes: Vec<String>,
|
||||
dedup_ratio: f64,
|
||||
file_id: String, // UUID for manifest storage
|
||||
}
|
||||
```
|
||||
|
||||
### Core Methods
|
||||
|
||||
| Method | Current (LocalFs) | Proposed (S3Vfs) |
|
||||
|--------|------------------|------------------|
|
||||
| `store_block(data)` | `std::fs::write(store_path/hash, data)` | `S3Vfs.put_object(blocks/hash, data)` |
|
||||
| `get_block(hash)` | `std::fs::read(store_path/hash)` | `S3Vfs.get_object(blocks/hash)` |
|
||||
| `increment_ref(hash)` | `std::fs::write(hash.ref, count)` | `S3Vfs.put_object(blocks/hash, data) + metadata update` |
|
||||
| `decrement_ref(hash)` | `std::fs::write/remove` | `S3Vfs.delete_object + metadata check` |
|
||||
| `dedup_file(source)` | Local file read + block store | Local file read + S3 PUT blocks |
|
||||
| `restore_file(manifest)` | Local file write + block read | Local file write + S3 GET blocks |
|
||||
| `get_ref_count(hash)` | `std::fs::read(hash.ref)` | `S3Vfs.head_object(blocks/hash) → metadata` |
|
||||
|
||||
---
|
||||
|
||||
## S3 Object Layout
|
||||
|
||||
```
|
||||
Bucket: markbase-dedup
|
||||
├── blocks/
|
||||
│ ├── <sha256-hash-1> # Dedup 块(4KB)
|
||||
│ │ └── Metadata: x-amz-meta-ref-count: 5
|
||||
│ ├── <sha256-hash-2>
|
||||
│ │ └── Metadata: x-amz-meta-ref-count: 2
|
||||
│ └── ...
|
||||
│
|
||||
├── manifests/
|
||||
│ ├── <file-id-1>.json # Manifest JSON
|
||||
│ │ └── Content: {"original_size": 1024, "block_hashes": [...], ...}
|
||||
│ ├── <file-id-2>.json
|
||||
│ └── ...
|
||||
│
|
||||
└── stats.json # DedupStats(可选)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Reference Count Management
|
||||
|
||||
### Challenge
|
||||
|
||||
S3 对象不支持 atomic increment/decrement 操作。
|
||||
|
||||
### Solution 1: Metadata Update (推荐 ⭐⭐⭐⭐⭐)
|
||||
|
||||
**流程**:
|
||||
```rust
|
||||
fn increment_ref(&self, hash: &str) -> Result<(), VfsError> {
|
||||
// 1. GET current metadata
|
||||
let head = self.s3vfs.head_object(&format!("blocks/{}", hash))?;
|
||||
let current_ref = head.metadata.get("x-amz-meta-ref-count")
|
||||
.and_then(|v| v.parse::<u64>().ok())
|
||||
.unwrap_or(0);
|
||||
|
||||
// 2. PUT with updated metadata
|
||||
let block_data = self.s3vfs.get_object(&format!("blocks/{}", hash))?;
|
||||
self.s3vfs.put_object_with_metadata(
|
||||
&format!("blocks/{}", hash),
|
||||
&block_data,
|
||||
[("x-amz-meta-ref-count", (current_ref + 1).to_string())]
|
||||
)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
**优势**:
|
||||
- ✅ 简单实现
|
||||
- ✅ 与 S3 标准兼容
|
||||
- ⚠️ 需要两次请求(GET + PUT)
|
||||
|
||||
**劣势**:
|
||||
- ⚠️ 非原子操作(并发问题)
|
||||
- ⚠️ 需要读取块数据(PUT 需要 body)
|
||||
|
||||
---
|
||||
|
||||
### Solution 2: Separate Ref Count Object
|
||||
|
||||
**流程**:
|
||||
```rust
|
||||
fn increment_ref(&self, hash: &str) -> Result<(), VfsError> {
|
||||
// 1. GET ref count object
|
||||
let ref_key = format!("refs/{}/count", hash);
|
||||
let current = self.s3vfs.get_object(&ref_key)
|
||||
.and_then(|data| data.parse::<u64>())
|
||||
.unwrap_or(0);
|
||||
|
||||
// 2. PUT updated ref count
|
||||
self.s3vfs.put_object(&ref_key, (current + 1).to_string())?;
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
**优势**:
|
||||
- ✅ 无需读取块数据
|
||||
- ✅ 更小的对象(仅数字)
|
||||
|
||||
**劣势**:
|
||||
- ⚠️ 需要额外对象存储
|
||||
- ⚠️ 非原子操作(并发问题)
|
||||
|
||||
---
|
||||
|
||||
### Solution 3: MinIO Extended API (企业版)
|
||||
|
||||
MinIO 企业版提供 `mc admin bucket policy` 和 object locking API。
|
||||
|
||||
**优势**:
|
||||
- ✅ 可能提供 atomic operation
|
||||
|
||||
**劣势**:
|
||||
- ⚠️ 仅 MinIO 企业版
|
||||
- ⚠️ 需要研究具体 API
|
||||
|
||||
---
|
||||
|
||||
## Concurrency Problem
|
||||
|
||||
### Scenario
|
||||
|
||||
Node A 和 Node B 同时 dedup 相同文件:
|
||||
1. Node A: `increment_ref(hash-abc)` → GET count=2 → PUT count=3
|
||||
2. Node B: `increment_ref(hash-abc)` → GET count=2 → PUT count=3
|
||||
3. 结果:count=3(错误,应为 count=4)
|
||||
|
||||
### Solution 1: Optimistic Locking
|
||||
|
||||
使用 S3 versioning 检测冲突:
|
||||
```rust
|
||||
fn increment_ref(&self, hash: &str) -> Result<(), VfsError> {
|
||||
loop {
|
||||
// 1. GET current version + metadata
|
||||
let (version_id, current_ref) = self.get_ref_with_version(hash)?;
|
||||
|
||||
// 2. PUT with version check
|
||||
let result = self.s3vfs.put_object_if_version(
|
||||
&format!("blocks/{}", hash),
|
||||
block_data,
|
||||
(current_ref + 1),
|
||||
version_id // Only succeed if version unchanged
|
||||
);
|
||||
|
||||
if result.is_ok() {
|
||||
break;
|
||||
}
|
||||
// Retry if version mismatch
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
**要求**:MinIO versioning enabled。
|
||||
|
||||
---
|
||||
|
||||
### Solution 2: Distributed Lock Service
|
||||
|
||||
使用外部分布式锁(如 Redis/Zookeeper):
|
||||
```rust
|
||||
fn increment_ref(&self, hash: &str) -> Result<(), VfsError> {
|
||||
// 1. Acquire distributed lock
|
||||
let lock = self.lock_service.acquire(&format!("lock:{}", hash))?;
|
||||
|
||||
// 2. Increment ref count
|
||||
self.update_ref_count(hash)?;
|
||||
|
||||
// 3. Release lock
|
||||
lock.release();
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
**劣势**:需要额外服务(Redis)。
|
||||
|
||||
---
|
||||
|
||||
### Solution 3: Accept Non-Atomic (简化方案)
|
||||
|
||||
对于 MarkBase Lightweight 定位:
|
||||
- ⚠️ 接受非原子操作风险
|
||||
- ⚠️ 偶尔 ref count 不准确(不影响数据完整性)
|
||||
- ⚠️ 定期修复(scrub job)
|
||||
|
||||
**推荐**:Phase 1 使用 Solution 1(Metadata Update),Phase 2 研究 MinIO versioning。
|
||||
|
||||
---
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
| Phase | Task | Code Lines | Priority | Risk |
|
||||
|-------|------|------------|----------|------|
|
||||
| **Phase 1** | DedupS3Store struct + basic I/O | ~300 | P0 | Medium |
|
||||
| **Phase 2** | Reference count metadata | ~100 | P0 | Medium |
|
||||
| **Phase 3** | Manifest storage to S3 | ~50 | P1 | Low |
|
||||
| **Phase 4** | CLI integration | ~100 | P1 | Low |
|
||||
| **Phase 5** | Async version (DedupAsyncS3Store) | ~200 | P2 | High |
|
||||
| **Phase 6** | Concurrency fix (versioning) | ~150 | P2 | High |
|
||||
| **Phase 7** | Performance benchmark | ~100 | P2 | Low |
|
||||
| **Total** | | **~1000** | | |
|
||||
|
||||
---
|
||||
|
||||
## DedupS3Store Implementation (Phase 1 Draft)
|
||||
|
||||
```rust
|
||||
use super::s3_fs::S3Vfs;
|
||||
use super::{VfsDedupConfig, VfsError};
|
||||
use sha2::{Sha256, Digest};
|
||||
use std::path::Path;
|
||||
|
||||
pub struct DedupS3Store {
|
||||
s3vfs: S3Vfs,
|
||||
bucket: String,
|
||||
block_prefix: String,
|
||||
manifest_prefix: String,
|
||||
config: VfsDedupConfig,
|
||||
}
|
||||
|
||||
impl DedupS3Store {
|
||||
pub fn new(
|
||||
endpoint: &str,
|
||||
region: &str,
|
||||
bucket: &str,
|
||||
access_key: &str,
|
||||
secret_key: &str,
|
||||
config: VfsDedupConfig,
|
||||
) -> Result<Self, VfsError> {
|
||||
let s3vfs = S3Vfs::new(endpoint, region, bucket, access_key, secret_key)?;
|
||||
Ok(Self {
|
||||
s3vfs,
|
||||
bucket: bucket.to_string(),
|
||||
block_prefix: "blocks/".to_string(),
|
||||
manifest_prefix: "manifests/".to_string(),
|
||||
config,
|
||||
})
|
||||
}
|
||||
|
||||
pub fn store_block(&self, data: &[u8]) -> Result<String, VfsError> {
|
||||
if data.len() > self.config.block_size {
|
||||
return Err(VfsError::Io(format!("Block size exceeds limit")));
|
||||
}
|
||||
|
||||
let hash = Self::hash_block(data);
|
||||
let key = format!("{}{}", self.block_prefix, hash);
|
||||
|
||||
// Check if block exists
|
||||
if !self.s3vfs.object_exists(&key)? {
|
||||
// PUT with initial ref count = 1
|
||||
self.s3vfs.put_object_with_metadata(
|
||||
&key,
|
||||
data,
|
||||
[("x-amz-meta-ref-count", "1")]
|
||||
)?;
|
||||
} else {
|
||||
// Increment ref count
|
||||
self.increment_ref(&hash)?;
|
||||
}
|
||||
|
||||
Ok(hash)
|
||||
}
|
||||
|
||||
pub fn get_block(&self, hash: &str) -> Result<Vec<u8>, VfsError> {
|
||||
let key = format!("{}{}", self.block_prefix, hash);
|
||||
self.s3vfs.get_object(&key)
|
||||
}
|
||||
|
||||
pub fn increment_ref(&self, hash: &str) -> Result<(), VfsError> {
|
||||
let key = format!("{}{}", self.block_prefix, hash);
|
||||
let head = self.s3vfs.head_object(&key)?;
|
||||
|
||||
let current_ref = head.metadata
|
||||
.get("x-amz-meta-ref-count")
|
||||
.and_then(|v| v.parse::<u64>().ok())
|
||||
.unwrap_or(1);
|
||||
|
||||
// Need to GET block data + PUT with new metadata
|
||||
let block_data = self.get_block(hash)?;
|
||||
self.s3vfs.put_object_with_metadata(
|
||||
&key,
|
||||
&block_data,
|
||||
[("x-amz-meta-ref-count", (current_ref + 1).to_string())]
|
||||
)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn dedup_file(&self, source: &Path) -> Result<DedupManifest, VfsError> {
|
||||
let mut file = std::fs::File::open(source)?;
|
||||
let mut manifest = DedupManifest::new();
|
||||
let mut buffer = vec![0u8; self.config.block_size];
|
||||
|
||||
loop {
|
||||
let n = file.read(&mut buffer)?;
|
||||
if n == 0 { break; }
|
||||
|
||||
manifest.original_size += n;
|
||||
let hash = self.store_block(&buffer[..n])?;
|
||||
manifest.block_hashes.push(hash);
|
||||
}
|
||||
|
||||
// Store manifest to S3
|
||||
let file_id = uuid::Uuid::new_v4().to_string();
|
||||
manifest.file_id = file_id;
|
||||
let manifest_key = format!("{}{}.json", self.manifest_prefix, file_id);
|
||||
let manifest_json = serde_json::to_string(&manifest)?;
|
||||
self.s3vfs.put_object(&manifest_key, manifest_json.as_bytes())?;
|
||||
|
||||
Ok(manifest)
|
||||
}
|
||||
|
||||
pub fn restore_file(&self, manifest_id: &str, target: &Path) -> Result<(), VfsError> {
|
||||
let manifest_key = format!("{}{}.json", self.manifest_prefix, manifest_id);
|
||||
let manifest_json = self.s3vfs.get_object(&manifest_key)?;
|
||||
let manifest: DedupManifest = serde_json::from_slice(&manifest_json)?;
|
||||
|
||||
let mut file = std::fs::File::create(target)?;
|
||||
for hash in &manifest.block_hashes {
|
||||
let block = self.get_block(hash)?;
|
||||
file.write_all(&block)?;
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn hash_block(data: &[u8]) -> String {
|
||||
let mut hasher = Sha256::new();
|
||||
hasher.update(data);
|
||||
hex::encode(hasher.finalize())
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration with MarkBase VFS
|
||||
|
||||
### Option 1: Standalone DedupS3Store
|
||||
|
||||
用户手动创建 DedupS3Store:
|
||||
```bash
|
||||
# CLI tool
|
||||
markbase dedup-upload --s3 --s3-endpoint http://localhost:9000 --file /data/large.iso
|
||||
markbase dedup-download --s3 --manifest-id <uuid> --output /data/restored.iso
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Option 2: DedupVfsBackend (VfsBackend trait)
|
||||
|
||||
创建 VfsBackend wrapper,自动 dedup:
|
||||
```rust
|
||||
pub struct DedupS3Backend {
|
||||
dedup_store: DedupS3Store,
|
||||
manifest_dir: PathBuf, // Local cache for manifests
|
||||
}
|
||||
|
||||
impl VfsBackend for DedupS3Backend {
|
||||
fn open_file(&self, path: &Path, flags: &OpenFlags) -> Result<Box<dyn VfsFile>, VfsError> {
|
||||
// 1. Read manifest from S3
|
||||
let manifest = self.load_manifest(path)?;
|
||||
|
||||
// 2. DedupS3File (read blocks from S3)
|
||||
Ok(Box::new(DedupS3File::new(self.dedup_store.clone(), manifest)))
|
||||
}
|
||||
|
||||
fn stat(&self, path: &Path) -> Result<VfsStat, VfsError> {
|
||||
// Read from manifest metadata
|
||||
let manifest = self.load_manifest(path)?;
|
||||
Ok(VfsStat {
|
||||
size: manifest.original_size,
|
||||
mtime: manifest.mtime,
|
||||
...
|
||||
})
|
||||
}
|
||||
|
||||
fn read_dir(&self, path: &Path) -> Result<Vec<VfsDirEntry>, VfsError> {
|
||||
// List manifests from S3
|
||||
self.dedup_store.s3vfs.list_objects(&self.manifest_prefix)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**优势**:
|
||||
- ✅ 透明 dedup(用户无需关心)
|
||||
- ✅ 与 SMB/WebDAV/SFTP 无缝集成
|
||||
|
||||
---
|
||||
|
||||
### Option 3: Hybrid (LocalFs + DedupS3Store)
|
||||
|
||||
```rust
|
||||
pub struct HybridDedupBackend {
|
||||
local: LocalFs, // Small files (<1MB) 存本地
|
||||
dedup_s3: DedupS3Store, // Large files (>1MB) dedup to S3
|
||||
}
|
||||
|
||||
impl VfsBackend for HybridDedupBackend {
|
||||
fn open_file(&self, path: &Path, flags: &OpenFlags) -> Result<Box<dyn VfsFile>, VfsError> {
|
||||
// Check file size
|
||||
let stat = self.local.stat(path)?;
|
||||
|
||||
if stat.size < self.dedup_s3.config.min_file_size {
|
||||
// Small file: direct LocalFs
|
||||
self.local.open_file(path, flags)
|
||||
} else {
|
||||
// Large file: dedup to S3
|
||||
self.dedup_s3.dedup_file(path)?;
|
||||
self.dedup_s3.open_file_from_manifest(path)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**推荐**:Option 1(Phase 1),Option 3(Phase 2)。
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Network Latency
|
||||
|
||||
| Operation | LocalFs | S3Vfs | Overhead |
|
||||
|-----------|---------|-------|----------|
|
||||
| store_block (4KB) | ~0.1ms | ~5-10ms (HTTP) | ~50-100x |
|
||||
| get_block (4KB) | ~0.1ms | ~5-10ms (HTTP) | ~50-100x |
|
||||
| dedup_file (100MB) | ~2s (25MB/s) | ~10s (10MB/s) | ~5x |
|
||||
|
||||
**缓解方案**:
|
||||
- ✅ Async concurrent upload(4-8 并发)
|
||||
- ✅ ReadCache(64MB cache)
|
||||
- ✅ Local cache for hot blocks
|
||||
|
||||
---
|
||||
|
||||
### Dedup Ratio Impact
|
||||
|
||||
| File Type | Dedup Ratio | Network Traffic Saved |
|
||||
|-----------|-------------|----------------------|
|
||||
| VM images (similar OS) | ~80% | -80% upload bandwidth |
|
||||
| Log files (daily) | ~60% | -60% upload bandwidth |
|
||||
| Unique files (photos) | ~5% | -5% upload bandwidth |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Phase 1 Implementation** (~300 lines)
|
||||
- `DedupS3Store` struct
|
||||
- `store_block()` / `get_block()` via S3Vfs
|
||||
- `increment_ref()` with metadata update
|
||||
|
||||
2. **Phase 2 CLI Integration** (~100 lines)
|
||||
- `markbase dedup-upload --s3`
|
||||
- `markbase dedup-download --manifest-id`
|
||||
|
||||
3. **Phase 3 Performance Test**
|
||||
- Benchmark dedup_file (100MB)
|
||||
- Compare LocalFs vs S3Vfs
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Concurrency**: Accept non-atomic ref count vs implement versioning?
|
||||
2. **Backend choice**: Standalone CLI vs VfsBackend integration?
|
||||
3. **Min versioning**: Should we require MinIO versioning enabled?
|
||||
4. **Ref count object**: Metadata vs separate object?
|
||||
5. **Block cache**: Should we cache blocks locally?
|
||||
|
||||
---
|
||||
|
||||
**文档创建**: 2026-06-25
|
||||
**最后更新**: 2026-06-25
|
||||
382
docs/MINIO_INTEGRATION.md
Normal file
382
docs/MINIO_INTEGRATION.md
Normal file
@@ -0,0 +1,382 @@
|
||||
# MinIO Integration Guide for MarkBase
|
||||
|
||||
**Date**: 2026-06-25
|
||||
**Status**: Ready for deployment
|
||||
**Backend**: S3Vfs (已有实现,无需修改代码)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
MinIO 是高性能、S3-compatible 的对象存储服务,完美契合 MarkBase 的定位:
|
||||
- ✅ 跨平台支持(macOS/Linux/Windows)
|
||||
- ✅ 轻量级部署(单节点即可)
|
||||
- ✅ 已有 S3Vfs 支持(无需修改代码)
|
||||
- ✅ 高性能(纠删码 + 分布式扩展)
|
||||
|
||||
---
|
||||
|
||||
## MinIO vs Ceph RADOS Comparison
|
||||
|
||||
| Aspect | MinIO | Ceph RADOS |
|
||||
|--------|-------|------------|
|
||||
| **Platform** | ✅ 全平台 | ❌ Linux-only |
|
||||
| **Deployment** | ⚠️⚠️ 单节点即可 | ⚠️⚠️⚠️⚠️⚠️ 需完整集群 |
|
||||
| **API** | ✅ S3-compatible HTTP | ❌ librados FFI |
|
||||
| **Code change** | ✅ 0 行(已有 S3Vfs) | ❌ ~1350 行 |
|
||||
| **Positioning** | ⭐⭐⭐⭐⭐ 完全匹配 | ❌ 不符合 Lightweight 定位 |
|
||||
|
||||
---
|
||||
|
||||
## MinIO Deployment
|
||||
|
||||
### macOS 单节点部署
|
||||
|
||||
```bash
|
||||
# 安装 MinIO
|
||||
brew install minio/stable/minio
|
||||
|
||||
# 启动 MinIO server
|
||||
minio server /path/to/data --console-address ":9001"
|
||||
|
||||
# 输出:
|
||||
# Endpoint: http://192.168.1.100:9000 http://127.0.0.1:9000
|
||||
# Console: http://192.168.1.100:9001 http://127.0.0.1:9001
|
||||
# AccessKey: minioadmin
|
||||
# SecretKey: minioadmin
|
||||
```
|
||||
|
||||
### Linux 生产部署
|
||||
|
||||
```bash
|
||||
# Docker 单节点
|
||||
docker run -d \
|
||||
--name minio \
|
||||
-p 9000:9000 \
|
||||
-p 9001:9001 \
|
||||
-v /data/minio:/data \
|
||||
minio/minio server /data --console-address ":9001"
|
||||
|
||||
# 分布式集群(4节点)
|
||||
docker run -d \
|
||||
--name minio \
|
||||
-p 9000:9000 \
|
||||
-p 9001:9001 \
|
||||
-v /data1:/data1 \
|
||||
-v /data2:/data2 \
|
||||
minio/minio server http://node1/data1 http://node2/data2 http://node3/data1 http://node4/data2 --console-address ":9001"
|
||||
```
|
||||
|
||||
### Kubernetes 部署(推荐生产)
|
||||
|
||||
```yaml
|
||||
# minio-deployment.yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: minio
|
||||
spec:
|
||||
replicas: 4
|
||||
selector:
|
||||
matchLabels:
|
||||
app: minio
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: minio
|
||||
spec:
|
||||
containers:
|
||||
- name: minio
|
||||
image: minio/minio:latest
|
||||
args:
|
||||
- server
|
||||
- http://minio-0/data http://minio-1/data http://minio-2/data http://minio-3/data
|
||||
- --console-address
|
||||
- ":9001"
|
||||
ports:
|
||||
- containerPort: 9000
|
||||
- containerPort: 9001
|
||||
volumeMounts:
|
||||
- name: data
|
||||
mountPath: /data
|
||||
volumes:
|
||||
- name: data
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MarkBase S3Vfs Integration
|
||||
|
||||
### 配置方式
|
||||
|
||||
**环境变量**:
|
||||
```bash
|
||||
export MB_S3_ENDPOINT=http://localhost:9000
|
||||
export MB_S3_REGION=us-east-1
|
||||
export MB_S3_BUCKET=markbase
|
||||
export MB_S3_ACCESS_KEY=minioadmin
|
||||
export MB_S3_SECRET_KEY=minioadmin
|
||||
```
|
||||
|
||||
**配置文件**(`config/s3.toml`):
|
||||
```toml
|
||||
[s3]
|
||||
enabled = true
|
||||
endpoint = "http://localhost:9000"
|
||||
region = "us-east-1"
|
||||
bucket = "markbase"
|
||||
access_key = "minioadmin"
|
||||
secret_key = "minioadmin"
|
||||
|
||||
[s3.webdav]
|
||||
# WebDAV 使用 S3 后端
|
||||
enabled = true
|
||||
user = "demo"
|
||||
root_prefix = "webdav/"
|
||||
```
|
||||
|
||||
### S3Vfs 使用示例
|
||||
|
||||
**WebDAV + MinIO**:
|
||||
```bash
|
||||
# 启动 WebDAV server(使用 MinIO 后端)
|
||||
cargo run -- webdav-start \
|
||||
--user demo \
|
||||
--port 8002 \
|
||||
--s3 \
|
||||
--s3-endpoint http://localhost:9000 \
|
||||
--s3-bucket markbase \
|
||||
--s3-access-key minioadmin \
|
||||
--s3-secret-key minioadmin \
|
||||
--s3-region us-east-1 \
|
||||
--root webdav/
|
||||
```
|
||||
|
||||
**SMB + MinIO**(通过 VFS backend):
|
||||
```bash
|
||||
# 启动 SMB server(使用 MinIO 后端)
|
||||
cargo run --features smb-server -- smb-start \
|
||||
--port 4445 \
|
||||
--share-name files \
|
||||
--s3 \
|
||||
--s3-endpoint http://localhost:9000 \
|
||||
--s3-bucket markbase \
|
||||
--s3-access-key minioadmin \
|
||||
--s3-secret-key minioadmin \
|
||||
--s3-region us-east-1 \
|
||||
--root smb/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MinIO Bucket Management
|
||||
|
||||
### 创建 Bucket
|
||||
|
||||
```bash
|
||||
# 使用 MinIO client (mc)
|
||||
mc alias set myminio http://localhost:9000 minioadmin minioadmin
|
||||
mc mb myminio/markbase
|
||||
|
||||
# 使用 AWS CLI
|
||||
aws --endpoint-url http://localhost:9000 s3 mb s3://markbase
|
||||
```
|
||||
|
||||
### 设置 Bucket Policy
|
||||
|
||||
```bash
|
||||
# 公开读取 policy(用于 public shares)
|
||||
mc anonymous set download myminio/markbase/public
|
||||
|
||||
# 私有 policy(默认)
|
||||
mc anonymous set none myminio/markbase/private
|
||||
```
|
||||
|
||||
### 设置 Bucket Quota
|
||||
|
||||
```bash
|
||||
# 设置 quota(MinIO 企业版功能)
|
||||
mc admin bucket quota myminio/markbase 10GB
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MinIO Features Relevant to MarkBase
|
||||
|
||||
| Feature | Description | MarkBase Use Case |
|
||||
|---------|-------------|-------------------|
|
||||
| **Erasure Coding** | 数据冗余(默认 EC:2) | 自动容错,类似 RAID |
|
||||
| **Versioning** | 对象版本控制 | 可替代 Snapshot 功能 |
|
||||
| **Bucket Policy** | ACL 管理 | 用户权限控制 |
|
||||
| **Lifecycle Rules** | 自动过期 | 旧 backup 清理 |
|
||||
| **Object Lock** | WORM 模式 | 合规性备份保护 |
|
||||
| **Replication** | 跨站点复制 | Disaster recovery |
|
||||
|
||||
### Versioning(替代 Snapshot)
|
||||
|
||||
```bash
|
||||
# 启用 versioning
|
||||
mc version enable myminio/markbase
|
||||
|
||||
# 列出对象版本
|
||||
mc ls --versions myminio/markbase/file.txt
|
||||
|
||||
# 恢复旧版本
|
||||
mc cp myminio/markbase/file.txt#version-id myminio/markbase/file.txt
|
||||
```
|
||||
|
||||
### Lifecycle Rules(Backup 清理)
|
||||
|
||||
```bash
|
||||
# 设置 30 天后自动删除
|
||||
mc ilm add myminio/markbase --expire-days 30
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### MinIO 性能参数
|
||||
|
||||
```bash
|
||||
# 高性能配置
|
||||
minio server /data \
|
||||
--console-address ":9001" \
|
||||
--parallel 8 \
|
||||
--cache /cache:1000
|
||||
```
|
||||
|
||||
### S3Vfs 性能优化
|
||||
|
||||
**并发上传**(已在 S3Vfs 实现):
|
||||
- Multipart upload(大于 5MB 自动分片)
|
||||
- 并发上传分片(默认 4 并发)
|
||||
|
||||
**缓存**:
|
||||
- ReadCache: 64MB, 64KB blocks, 5min TTL(已在 cache.rs 实现)
|
||||
- WriteCache: 32MB(已在 cache.rs 实现)
|
||||
|
||||
---
|
||||
|
||||
## Docker Compose Example
|
||||
|
||||
```yaml
|
||||
version: '3'
|
||||
services:
|
||||
minio:
|
||||
image: minio/minio:latest
|
||||
command: server /data --console-address ":9001"
|
||||
ports:
|
||||
- "9000:9000"
|
||||
- "9001:9001"
|
||||
volumes:
|
||||
- minio-data:/data
|
||||
environment:
|
||||
- MINIO_ROOT_USER=minioadmin
|
||||
- MINIO_ROOT_PASSWORD=minioadmin
|
||||
|
||||
markbase-webdav:
|
||||
build: .
|
||||
command: webdav-start --user demo --port 8002 --s3 --s3-endpoint http://minio:9000 --s3-bucket markbase --s3-access-key minioadmin --s3-secret-key minioadmin
|
||||
ports:
|
||||
- "8002:8002"
|
||||
environment:
|
||||
- MB_S3_ENDPOINT=http://minio:9000
|
||||
depends_on:
|
||||
- minio
|
||||
|
||||
volumes:
|
||||
minio-data:
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration Checklist
|
||||
|
||||
| Task | Status | Notes |
|
||||
|------|--------|-------|
|
||||
| **MinIO 部署** | ⏳ User action | macOS/Linux/Docker |
|
||||
| **创建 Bucket** | ⏳ User action | `mc mb myminio/markbase` |
|
||||
| **S3Vfs 配置** | ✅ 已支持 | 无需修改代码 |
|
||||
| **WebDAV + S3** | ✅ 已支持 | CLI 参数已实现 |
|
||||
| **SMB + S3** | ✅ 已支持 | CLI 参数已实现 |
|
||||
| **SFTP + S3** | ⏳ 待实现 | 需要 SFTP S3 backend |
|
||||
| **Backup to S3** | ✅ 已支持 | BackupManifest + S3Vfs |
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### MinIO 连接问题
|
||||
|
||||
```bash
|
||||
# 检查 MinIO status
|
||||
mc admin info myminio
|
||||
|
||||
# 检查 endpoint 连接
|
||||
curl -I http://localhost:9000/minio/health/live
|
||||
```
|
||||
|
||||
### S3Vfs 错误
|
||||
|
||||
**常见错误**:
|
||||
- `VfsError::NotFound` → Bucket 或 object 不存在
|
||||
- `VfsError::PermissionDenied` → Access key/secret key 错误
|
||||
- `VfsError::Io("S3 PUT failed: 403")` → Bucket policy 拒绝写入
|
||||
|
||||
**调试方法**:
|
||||
```bash
|
||||
# 查看 MinIO logs
|
||||
docker logs minio
|
||||
|
||||
# 使用 mc 测试
|
||||
mc cp test.txt myminio/markbase/test.txt
|
||||
mc ls myminio/markbase/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## MinIO vs S3Vfs Feature Mapping
|
||||
|
||||
| VfsBackend Method | MinIO S3 API | Status |
|
||||
|-------------------|--------------|--------|
|
||||
| `read_dir()` | ListObjectsV2 | ✅ |
|
||||
| `open_file()` | GetObject / PutObject | ✅ |
|
||||
| `stat()` | HeadObject | ✅ |
|
||||
| `create_dir()` | PutObject (0-byte) | ✅ |
|
||||
| `remove_dir()` | DeleteObject | ✅ |
|
||||
| `remove_file()` | DeleteObject | ✅ |
|
||||
| `rename()` | CopyObject + DeleteObject | ✅ |
|
||||
| `exists()` | HeadObject | ✅ |
|
||||
| `copy()` | CopyObject | ✅ |
|
||||
| `hard_link()` | CopyObject | ✅ |
|
||||
| `create_snapshot()` | Versioning | ⚠️ 需启用 versioning |
|
||||
| `list_snapshots()` | ListObjectVersions | ⚠️ 需实现 |
|
||||
| `set_quota()` | Bucket quota | ⚠️ MinIO 企业版 |
|
||||
| `set_acl()` | Bucket policy | ⚠️ 需实现 |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **部署 MinIO**(用户 action)
|
||||
- macOS: `brew install minio && minio server /data`
|
||||
- Docker: `docker run minio/minio server /data`
|
||||
|
||||
2. **创建 Bucket**(用户 action)
|
||||
- `mc alias set myminio http://localhost:9000 minioadmin minioadmin`
|
||||
- `mc mb myminio/markbase`
|
||||
|
||||
3. **配置 MarkBase**
|
||||
- 设置 `MB_S3_*` 环境变量
|
||||
- 或使用 CLI 参数 `--s3 --s3-endpoint ...`
|
||||
|
||||
4. **测试连接**
|
||||
- WebDAV: `curl -X PROPFIND http://localhost:8002/webdav/`
|
||||
- SMB: `smbclient -p 4445 -L localhost`
|
||||
|
||||
---
|
||||
|
||||
**文档创建**: 2026-06-25
|
||||
**最后更新**: 2026-06-25
|
||||
Reference in New Issue
Block a user