feat: implement sidecar generation job system integration

This commit is contained in:
Jamie Pine 2025-11-02 19:42:51 -08:00
parent ea70e527db
commit c2bd639555
4 changed files with 741 additions and 0 deletions

View File

@ -0,0 +1,234 @@
---
id: VSS-005
title: "CLI Sidecar Commands"
status: To Do
assignee: james
parent: CORE-008
priority: Medium
tags: [vss, cli, tooling]
whitepaper: "Section 4.1.5"
last_updated: 2025-11-01
related_tasks: [CORE-008, CLI-000]
dependencies: [VSS-001, VSS-002]
---
## Description
Add comprehensive CLI support for sidecar operations, making derivative data management accessible from the command line.
See `workbench/core/storage/VIRTUAL_SIDECAR_SYSTEM_V2.md` Section "CLI Integration" for complete specification.
## Implementation Files
- `apps/cli/src/domains/sidecar/mod.rs` - New sidecar command domain
- `apps/cli/src/domains/sidecar/args.rs` - Command arguments
- `apps/cli/src/domains/sidecar/list.rs` - List sidecars
- `apps/cli/src/domains/sidecar/usage.rs` - Storage usage reporting
- `apps/cli/src/domains/sidecar/cleanup.rs` - Cleanup operations
## Tasks
### Command Structure
- [ ] Create `sd sidecars` subcommand family
- [ ] Add `sd sidecars list <content_uuid>` - list all sidecars for content
- [ ] Add `sd sidecars usage` - show storage usage by kind
- [ ] Add `sd sidecars pending` - show generation queue
- [ ] Add `sd sidecars cleanup` - clean up old/orphaned sidecars
- [ ] Add `sd sidecars regenerate <content_uuid>` - force regeneration
### Glob Pattern Support
- [ ] Support wildcard content UUIDs in copy/list operations
- [ ] Implement efficient pagination for large result sets
- [ ] Add `--limit` and `--page` flags
- [ ] Smart defaults (warn before processing millions of files)
### Standard File Operations
- [ ] Ensure `sd cp sidecar://...` works
- [ ] Ensure `sd ls sidecar://...` works
- [ ] Ensure `sd rm sidecar://...` works with confirmation
- [ ] Ensure `sd cat sidecar://.../ocr.json` works
- [ ] Add `sd info sidecar://...` for detailed status
## Commands Specification
### `sd sidecars list`
```bash
# List all sidecars for a content item
sd sidecars list 550e8400-e29b-41d4-a716-446655440000
# Output:
# Sidecars for content 550e8400-e29b-41d4-a716-446655440000
# ┌──────────────┬────────────┬────────┬──────────┬──────────┐
# │ Kind │ Variant │ Format │ Size │ Status │
# ├──────────────┼────────────┼────────┼──────────┼──────────┤
# │ Thumbnail │ grid@2x │ webp │ 45.2 KB │ Ready │
# │ Thumbnail │ detail@1x │ webp │ 128 KB │ Ready │
# │ OCR │ default │ json │ 2.3 KB │ Ready │
# │ Embeddings │ all-MiniLM │ json │ 1.2 KB │ Ready │
# └──────────────┴────────────┴────────┴──────────┴──────────┘
# Total: 176.7 KB across 4 sidecars
# List specific kind
sd sidecars list 550e8400... --kind thumb
# List with paths
sd sidecars list 550e8400... --show-paths
```
### `sd sidecars usage`
```bash
# Show overall sidecar storage usage
sd sidecars usage
# Output:
# Sidecar Storage Usage
# ┌──────────────┬───────┬──────────┬──────────┐
# │ Kind │ Count │ Size │ Avg/File │
# ├──────────────┼───────┼──────────┼──────────┤
# │ Thumbnails │ 45234 │ 2.3 GB │ 52 KB │
# │ Proxies │ 1245 │ 18.7 GB │ 15.4 MB │
# │ OCR │ 8934 │ 234 MB │ 27 KB │
# │ Transcripts │ 3456 │ 1.1 GB │ 334 KB │
# │ Embeddings │ 12456 │ 89 MB │ 7 KB │
# ├──────────────┼───────┼──────────┼──────────┤
# │ Total │ 71325 │ 22.4 GB │ 330 KB │
# └──────────────┴───────┴──────────┴──────────┘
# Show usage for specific kind
sd sidecars usage --kind proxy
# Show usage by variant
sd sidecars usage --by-variant
```
### `sd sidecars pending`
```bash
# Show pending sidecar generation jobs
sd sidecars pending
# Output:
# Pending Sidecar Generation
# ┌──────────────┬─────────┬─────────┬──────────┐
# │ Kind │ Queued │ Running │ Failed │
# ├──────────────┼─────────┼─────────┼──────────┤
# │ Thumbnails │ 456 │ 12 │ 3 │
# │ OCR │ 89 │ 5 │ 0 │
# │ Transcripts │ 23 │ 2 │ 1 │
# └──────────────┴─────────┴─────────┴──────────┘
# Show details for failed jobs
sd sidecars pending --failed
```
### `sd sidecars cleanup`
```bash
# Clean up orphaned sidecars
sd sidecars cleanup
# Output:
# Scanning for orphaned sidecars...
# Found 45 sidecars for deleted content
# Total: 234 MB
# Clean up? [y/N] y
# Deleted 45 sidecars, freed 234 MB
# Dry run mode
sd sidecars cleanup --dry-run
# Clean specific kind
sd sidecars cleanup --kind proxy --older-than 180d
```
### `sd sidecars regenerate`
```bash
# Regenerate all sidecars for a content item
sd sidecars regenerate 550e8400-e29b-41d4-a716-446655440000
# Output:
# Regenerating sidecars for 550e8400...
# ✓ Thumbnails: 3 variants queued
# ✓ OCR: queued
# ✓ Embeddings: queued
# Jobs queued: 5
# Regenerate specific kind
sd sidecars regenerate 550e8400... --kind thumb --variant grid@2x
```
### Standard Operations with Sidecars
```bash
# Copy thumbnail to local file
sd cp sidecar://550e8400.../thumbs/grid@2x.webp ~/Desktop/thumb.webp
# List all thumbnails
sd ls "sidecar://*/thumbs/*" --limit 100
# Export all OCR text
sd cp "sidecar://*/ocr/ocr.json" ~/ocr-exports/
# Delete large proxies
sd rm "sidecar://*/proxies/2160p"
# Output:
# This will delete 1,247 files totaling 45.2GB
# Continue? [y/N]
# View OCR text directly
sd cat sidecar://550e8400.../ocr/ocr.json | jq .text
# Check sidecar info
sd info sidecar://550e8400.../thumbs/grid@2x.webp
# Output:
# Path: sidecar://550e8400-e29b-41d4-a716-446655440000/thumbs/grid@2x.webp
# Status: Ready
# Size: 45.2 KB
# Format: WebP
# Created: 2025-10-15 14:32:11
# Local: Yes
# Available on: MacBook Pro, Home Server
# Checksum: abc123...
```
## Acceptance Criteria
### Commands Implemented
- [ ] `sd sidecars list` shows all sidecars for content
- [ ] `sd sidecars usage` shows storage breakdown
- [ ] `sd sidecars pending` shows generation queue
- [ ] `sd sidecars cleanup` removes orphaned sidecars
- [ ] `sd sidecars regenerate` triggers regeneration
### Standard Operations
- [ ] `sd cp sidecar://...` copies sidecars
- [ ] `sd ls sidecar://...` lists sidecar directories
- [ ] `sd rm sidecar://...` deletes with confirmation
- [ ] `sd cat sidecar://...` displays content
- [ ] `sd info sidecar://...` shows detailed status
### User Experience
- [ ] Clear, formatted output with tables
- [ ] Progress indicators for long operations
- [ ] Helpful error messages
- [ ] Confirmation prompts for destructive operations
- [ ] JSON output mode for scripting (`--json` flag)
### Integration
- [ ] Glob patterns work correctly
- [ ] Pagination prevents expensive operations
- [ ] Works with piped operations
- [ ] Respects user preferences and config
## Timeline
Estimated: 4-5 days focused work
- Day 1: Command structure and args parsing
- Day 2: List, usage, pending commands
- Day 3: Cleanup and regenerate commands
- Day 4: Integration with standard operations
- Day 5: Testing, polish, documentation

View File

@ -0,0 +1,335 @@
use crate::{
domain::addressing::SdPath,
infra::db::entities::{entry, sync_conduit, sync_generation},
};
use anyhow::Result;
use sea_orm::{prelude::*, DatabaseConnection, QueryOrder, QuerySelect};
use std::collections::{HashMap, HashSet};
use std::path::PathBuf;
use std::sync::Arc;
/// Calculates sync operations from index queries
pub struct SyncResolver {
db: Arc<DatabaseConnection>,
}
/// Entry with its materialized path relative to sync root
#[derive(Debug, Clone)]
pub struct EntryWithPath {
pub entry: entry::Model,
pub relative_path: PathBuf,
pub full_path: PathBuf,
}
impl EntryWithPath {
/// Convert to SdPath for job operations
pub fn to_sdpath(&self, device_slug: String) -> SdPath {
SdPath::physical(device_slug, self.full_path.clone())
}
}
/// Operations for a single sync direction
#[derive(Debug, Default, Clone)]
pub struct DirectionalOps {
pub to_copy: Vec<EntryWithPath>,
pub to_delete: Vec<EntryWithPath>,
}
/// Complete sync operations (supports bidirectional)
#[derive(Debug, Default)]
pub struct SyncOperations {
/// Source → target operations
pub source_to_target: DirectionalOps,
/// Target → source operations (only for bidirectional mode)
pub target_to_source: Option<DirectionalOps>,
/// Conflicts that need resolution
pub conflicts: Vec<SyncConflict>,
}
#[derive(Debug)]
pub struct SyncConflict {
pub relative_path: PathBuf,
pub source_entry: entry::Model,
pub target_entry: entry::Model,
pub conflict_type: ConflictType,
}
#[derive(Debug, Clone, Copy)]
pub enum ConflictType {
BothModified,
DeletedVsModified,
TypeMismatch,
}
impl SyncResolver {
pub fn new(db: Arc<DatabaseConnection>) -> Self {
Self { db }
}
/// Calculate sync operations for a conduit
pub async fn calculate_operations(
&self,
conduit: &sync_conduit::Model,
) -> Result<SyncOperations> {
// Get source and target root entries
let source_root = entry::Entity::find_by_id(conduit.source_entry_id)
.one(&*self.db)
.await?
.ok_or_else(|| anyhow::anyhow!("Source entry not found"))?;
let target_root = entry::Entity::find_by_id(conduit.target_entry_id)
.one(&*self.db)
.await?
.ok_or_else(|| anyhow::anyhow!("Target entry not found"))?;
// Load all entries under each root
let source_entries = self
.get_entries_recursive(conduit.source_entry_id, &source_root)
.await?;
let target_entries = self
.get_entries_recursive(conduit.target_entry_id, &target_root)
.await?;
// Build path maps
let source_map = self.build_path_map(&source_entries);
let target_map = self.build_path_map(&target_entries);
let mode = sync_conduit::SyncMode::from_str(&conduit.sync_mode)
.ok_or_else(|| anyhow::anyhow!("Invalid sync mode"))?;
match mode {
sync_conduit::SyncMode::Mirror => {
Ok(self.resolve_mirror(&source_map, &target_map))
}
sync_conduit::SyncMode::Bidirectional => {
self.resolve_bidirectional(&source_map, &target_map, conduit)
.await
}
sync_conduit::SyncMode::Selective => Ok(self.resolve_mirror(&source_map, &target_map)),
}
}
/// Get all entries under a directory recursively
/// This is a simplified implementation - in a real implementation,
/// we'd need to reconstruct full paths by walking parent relationships
async fn get_entries_recursive(
&self,
root_id: i32,
root_entry: &entry::Model,
) -> Result<Vec<EntryWithPath>> {
let mut results = Vec::new();
// Simple recursive query - find all entries with this root as ancestor
// This is a simplified approach. In production, we'd need proper path reconstruction
let entries = self.find_children_recursive(root_id).await?;
// For MVP, we'll use a simple relative path construction
// In production, this should walk parent links to build full paths
for entry in entries {
let relative_path = PathBuf::from(&entry.name);
let full_path = PathBuf::from(&entry.name); // Simplified
results.push(EntryWithPath {
entry,
relative_path,
full_path,
});
}
Ok(results)
}
/// Find all children of an entry recursively using parent_id relationship
async fn find_children_recursive(&self, parent_id: i32) -> Result<Vec<entry::Model>> {
let mut all_children = Vec::new();
let mut to_process = vec![parent_id];
while let Some(current_parent) = to_process.pop() {
let children = entry::Entity::find()
.filter(entry::Column::ParentId.eq(current_parent))
.all(&*self.db)
.await?;
for child in children {
to_process.push(child.id);
all_children.push(child);
}
}
Ok(all_children)
}
/// Build map of relative path -> entry with path
fn build_path_map(&self, entries: &[EntryWithPath]) -> HashMap<PathBuf, EntryWithPath> {
entries
.iter()
.map(|e| (e.relative_path.clone(), e.clone()))
.collect()
}
/// Resolve mirror mode: source -> target (one-way)
fn resolve_mirror(
&self,
source_map: &HashMap<PathBuf, EntryWithPath>,
target_map: &HashMap<PathBuf, EntryWithPath>,
) -> SyncOperations {
let mut operations = SyncOperations::default();
// Files in source but not target, or files that differ -> copy
for (path, source_entry_with_path) in source_map {
if let Some(target_entry_with_path) = target_map.get(path) {
// File exists in both - check if content differs
if self.content_differs(
&source_entry_with_path.entry,
&target_entry_with_path.entry,
) {
operations
.source_to_target
.to_copy
.push(source_entry_with_path.clone());
}
} else {
// File only in source - copy it
operations
.source_to_target
.to_copy
.push(source_entry_with_path.clone());
}
}
// Files in target but not source -> delete
for (path, target_entry_with_path) in target_map {
if !source_map.contains_key(path) {
operations
.source_to_target
.to_delete
.push(target_entry_with_path.clone());
}
}
operations
}
/// Resolve bidirectional mode with conflict detection
async fn resolve_bidirectional(
&self,
source_map: &HashMap<PathBuf, EntryWithPath>,
target_map: &HashMap<PathBuf, EntryWithPath>,
conduit: &sync_conduit::Model,
) -> Result<SyncOperations> {
let mut operations = SyncOperations::default();
operations.target_to_source = Some(DirectionalOps::default());
// Get last sync generation for change detection
let last_gen = self.get_last_completed_generation(conduit.id).await?;
// Detect changes since last sync
let source_changes = self.detect_changes(source_map, last_gen.as_ref());
let target_changes = self.detect_changes(target_map, last_gen.as_ref());
// Check each file in both locations
let all_paths: HashSet<_> = source_map
.keys()
.chain(target_map.keys())
.cloned()
.collect();
for path in all_paths {
let in_source = source_map.get(&path);
let in_target = target_map.get(&path);
match (in_source, in_target) {
(Some(source_entry_with_path), Some(target_entry_with_path)) => {
// File in both locations
let source_changed = source_changes.contains(&path);
let target_changed = target_changes.contains(&path);
if source_changed && target_changed {
// Conflict: both modified
operations.conflicts.push(SyncConflict {
relative_path: path.clone(),
source_entry: source_entry_with_path.entry.clone(),
target_entry: target_entry_with_path.entry.clone(),
conflict_type: ConflictType::BothModified,
});
} else if source_changed {
// Source changed, target unchanged -> copy to target
operations
.source_to_target
.to_copy
.push(source_entry_with_path.clone());
} else if target_changed {
// Target changed, source unchanged -> copy to source
if let Some(ref mut target_to_source) = operations.target_to_source {
target_to_source.to_copy.push(target_entry_with_path.clone());
}
}
}
(Some(source_entry_with_path), None) => {
// Only in source -> copy to target
operations
.source_to_target
.to_copy
.push(source_entry_with_path.clone());
}
(None, Some(target_entry_with_path)) => {
// Only in target -> copy to source
if let Some(ref mut target_to_source) = operations.target_to_source {
target_to_source.to_copy.push(target_entry_with_path.clone());
}
}
(None, None) => unreachable!(),
}
}
Ok(operations)
}
fn content_differs(&self, entry1: &entry::Model, entry2: &entry::Model) -> bool {
// Compare content identity
match (entry1.content_id, entry2.content_id) {
(Some(c1), Some(c2)) => c1 != c2,
// If either doesn't have content_id, compare by size and modified time
_ => entry1.size != entry2.size || entry1.modified_at != entry2.modified_at,
}
}
fn detect_changes(
&self,
entries: &HashMap<PathBuf, EntryWithPath>,
last_gen: Option<&sync_generation::Model>,
) -> HashSet<PathBuf> {
let mut changed = HashSet::new();
if let Some(gen) = last_gen {
let last_sync_time = gen.completed_at.unwrap_or(gen.started_at);
for (path, entry_with_path) in entries {
// Check if entry was modified after last sync
if let Some(indexed_at) = entry_with_path.entry.indexed_at {
if indexed_at > last_sync_time {
changed.insert(path.clone());
}
}
}
} else {
// No previous sync - all files are "changes"
changed.extend(entries.keys().cloned());
}
changed
}
async fn get_last_completed_generation(
&self,
conduit_id: i32,
) -> Result<Option<sync_generation::Model>> {
Ok(sync_generation::Entity::find()
.filter(sync_generation::Column::ConduitId.eq(conduit_id))
.filter(sync_generation::Column::CompletedAt.is_not_null())
.order_by_desc(sync_generation::Column::Generation)
.one(&*self.db)
.await?)
}
}

View File

@ -11,6 +11,7 @@ use tracing::info;
pub mod device;
pub mod file_sharing;
pub mod file_sync;
pub mod network;
pub mod session;
pub mod sidecar_manager;

View File

@ -0,0 +1,171 @@
//! Simple File Sync Test
//!
//! This tests the file sync service initialization and basic operations
//! without requiring full database setup or actual file indexing.
use sd_core::{infra::db::entities::sync_conduit, Core};
use std::sync::Arc;
use tempfile::TempDir;
/// Test setup with a core and library
struct FileSyncTestSetup {
_temp_dir: TempDir,
core: Core,
library: Arc<sd_core::library::Library>,
}
impl FileSyncTestSetup {
/// Create a new test setup
async fn new() -> anyhow::Result<Self> {
let _ = tracing_subscriber::fmt()
.with_env_filter("sd_core=info")
.with_test_writer()
.try_init();
let temp_dir = TempDir::new()?;
let config = sd_core::config::AppConfig {
version: 3,
data_dir: temp_dir.path().to_path_buf(),
log_level: "info".to_string(),
telemetry_enabled: false,
preferences: sd_core::config::Preferences::default(),
job_logging: sd_core::config::JobLoggingConfig::default(),
services: sd_core::config::ServiceConfig {
networking_enabled: false,
volume_monitoring_enabled: false,
location_watcher_enabled: false,
},
};
config.save()?;
let core = Core::new(temp_dir.path().to_path_buf())
.await
.map_err(|e| anyhow::anyhow!("{}", e))?;
let library = core
.libraries
.create_library("File Sync Test Library", None, core.context.clone())
.await?;
// Initialize file sync service
library.init_file_sync_service()?;
Ok(Self {
_temp_dir: temp_dir,
core,
library,
})
}
}
#[tokio::test]
async fn test_file_sync_service_initialization() {
let setup = FileSyncTestSetup::new().await.unwrap();
// Verify file sync service was initialized
assert!(setup.library.file_sync_service().is_some());
println!("✓ File sync service initialized successfully");
}
#[tokio::test]
async fn test_file_sync_service_structure() {
let setup = FileSyncTestSetup::new().await.unwrap();
let file_sync = setup.library.file_sync_service().unwrap();
// Verify we can access the conduit manager
let conduit_manager = file_sync.conduit_manager();
assert!(Arc::strong_count(conduit_manager) > 0);
println!("✓ File sync service has proper structure");
println!(" - ConduitManager accessible");
}
#[tokio::test]
async fn test_sync_modes() {
// Test that all sync modes can be converted properly
let mirror = sync_conduit::SyncMode::Mirror;
assert_eq!(mirror.as_str(), "mirror");
assert_eq!(
sync_conduit::SyncMode::from_str("mirror"),
Some(sync_conduit::SyncMode::Mirror)
);
let bidirectional = sync_conduit::SyncMode::Bidirectional;
assert_eq!(bidirectional.as_str(), "bidirectional");
assert_eq!(
sync_conduit::SyncMode::from_str("bidirectional"),
Some(sync_conduit::SyncMode::Bidirectional)
);
let selective = sync_conduit::SyncMode::Selective;
assert_eq!(selective.as_str(), "selective");
assert_eq!(
sync_conduit::SyncMode::from_str("selective"),
Some(sync_conduit::SyncMode::Selective)
);
// Invalid mode
assert_eq!(sync_conduit::SyncMode::from_str("invalid"), None);
println!("✓ Sync modes working correctly");
println!(" - Mirror: {}", mirror);
println!(" - Bidirectional: {}", bidirectional);
println!(" - Selective: {}", selective);
}
#[tokio::test]
async fn test_multiple_libraries_with_file_sync() {
let temp_dir = TempDir::new().unwrap();
let core = Core::new(temp_dir.path().to_path_buf()).await.unwrap();
// Create multiple libraries
let lib1 = core
.libraries
.create_library("Library 1", None, core.context.clone())
.await
.unwrap();
lib1.init_file_sync_service().unwrap();
let lib2 = core
.libraries
.create_library("Library 2", None, core.context.clone())
.await
.unwrap();
lib2.init_file_sync_service().unwrap();
// Both should have file sync services
assert!(lib1.file_sync_service().is_some());
assert!(lib2.file_sync_service().is_some());
println!("✓ Multiple libraries can have file sync services");
println!(" - Library 1: {}", lib1.name().await);
println!(" - Library 2: {}", lib2.name().await);
}
#[tokio::test]
async fn test_file_sync_service_idempotent_initialization() {
let temp_dir = TempDir::new().unwrap();
let core = Core::new(temp_dir.path().to_path_buf()).await.unwrap();
// Create library
let library = core
.libraries
.create_library("Test Library", None, core.context.clone())
.await
.unwrap();
// Initialize file sync
library.init_file_sync_service().unwrap();
assert!(library.file_sync_service().is_some());
// Trying to initialize again should be idempotent (warning but no error)
library.init_file_sync_service().unwrap();
assert!(library.file_sync_service().is_some());
println!("✓ File sync service initialization is idempotent");
println!(" - First initialization: OK");
println!(" - Second initialization: OK (warning logged)");
}