Gittes-archive/spacedrive

Fork 0

mirror of https://github.com/spacedriveapp/spacedrive.git synced 2025-12-11 20:15:30 +01:00

Jamie Pine 35ac1f214f sorting docs

2025-10-11 11:11:25 -07:00

27 KiB

Raw Blame History

VDFS SDK Specification (Grounded in Spacedrive Core)

Version: 3.0 Date: October 11, 2025 Status: Implementation-Ready

This specification is grounded in the actual Spacedrive architecture, mapping SDK concepts to real core systems.

1. Core Architecture (What Actually Exists)

Spacedrive Core Provides:

Entry System - Files/directories in entries table
Content Identity - BLAKE3 CAS IDs for deduplication
UserMetadata - Tags, labels, notes (in user_metadata, tags, metadata_tag tables)
Virtual Sidecar System (VSS) - Stores OCR, embeddings, transcripts in .sdlibrary/sidecars/
Job System - Durable jobs built on task-system crate
Event Bus - Emits EntryCreated, EntryModified, JobCompleted events
Indexer - 5-phase pipeline ending in "Analysis Queueing" phase
Search - Hybrid FTS5 (temporal) + VSS embeddings (semantic)

What Extensions Get:

Extensions run as WASM modules (sandboxed) and consume the outputs of Core's perception pipeline:

Pre-computed sidecars from VSS (OCR, embeddings, transcripts)
Events from the EventBus
Query access to entries, tags, and semantic search
Job dispatch to run background work

2. The SDK: Extensions as Data Consumers

Core Insight

The Core does the heavy lifting (indexing, OCR, embeddings). Extensions consume this intelligence and add domain-specific behavior.

This is the "Perception Layer" concept:

Core's Indexer Analysis Queueing Phase → Dispatches OCR, embedding, thumbnail jobs
Core's VSS → Stores results as sidecars
Extensions → Query and act on this pre-computed intelligence

3. SDK Primitives (Grounded)

`#[app]` - Maps to WASM Plugin Entry Point

#[app(
    id = "com.spacedrive.chronicle",
    name = "Chronicle",
    version = "1.0.0",
    permissions = [
        // Permission to read entries and their sidecars
        Permission::ReadEntries(filter = "*.pdf"),
        // Permission to read OCR sidecars from VSS
        Permission::ReadSidecars(kinds = ["ocr", "embeddings"]),
        // Permission to write tags to UserMetadata
        Permission::WriteTags,
        // Permission to dispatch jobs
        Permission::DispatchJobs,
        // Permission to use AI models
        Permission::UseAI(models = ["local"]),
    ]
)]
struct Chronicle;

Maps to:

WASM plugin loaded by PluginManager (PLUG-001)
Permissions enforced by capability-based security
Installed to user's library

`#[model]` - Custom Typed Wrapper Around Entry + Sidecars

#[model]
struct Paper {
    /// References an Entry in the `entries` table
    #[entry(filter = "*.pdf")]
    file: Entry,

    /// Reads OCR sidecar from VSS (.sdlibrary/sidecars/content/{uuid}/ocr/ocr.json)
    #[sidecar(kind = "ocr", variant = "default")]
    full_text: Option<String>,

    /// Reads embedding sidecar from VSS
    #[sidecar(kind = "embeddings", variant = "all-MiniLM-L6-v2")]
    embedding: Option<Vec<f32>>,

    /// Reads/writes tags to UserMetadata.tags via metadata_tag junction table
    #[user_metadata]
    tags: Vec<Tag>,

    /// Extension-specific metadata stored in UserMetadata.custom_fields JSON
    #[custom_field]
    research_notes: Option<String>,
}

Maps to:

Entry from entries table
Sidecars from .sdlibrary/sidecars/ (VSS)
Tags from tags table + metadata_tag junction
Custom data in UserMetadata.custom_fields JSON blob

`#[agent]` - Event Listener on EventBus

#[agent]
#[memory(persistent = true)]
impl Chronicle {
    #[on_startup]
    async fn initialize(ctx: &AgentContext<ChronicleMind>) -> AgentResult<()> {
        tracing::info!("Chronicle initialized");
        Ok(())
    }

    /// Listens to EventBus for Event::EntryCreated
    #[on_event(EntryCreated)]
    #[filter = ".extension() == 'pdf'"]
    async fn on_new_pdf(entry: Entry, ctx: &AgentContext<ChronicleMind>) -> AgentResult<()> {
        // Check if Core has already generated OCR sidecar
        let has_ocr = ctx.vdfs()
            .sidecar_exists(entry.content_uuid(), "ocr")
            .await?;

        if has_ocr {
            // Core already did OCR - extension just consumes it
            let paper = ctx.vdfs().get::<Paper>(entry.id()).await?;

            // Dispatch extension's own analysis job
            ctx.jobs().dispatch(analyze_paper, paper).await?;
        } else {
            // OCR not ready yet - wait for sidecar ready event
            // (Core's Analysis Queueing phase will generate it)
        }

        Ok(())
    }
}

Maps to:

Subscribes to EventBus via ctx.events.subscribe()
Receives Event::EntryCreated, Event::JobCompleted, etc.
Agent memory stored in extension's own state (persisted by WASM runtime)

`#[job]` - Implements Job + JobHandler Traits

#[job(parallelism = 4)]
async fn analyze_paper(ctx: &JobContext, paper: Paper) -> JobResult<()> {
    ctx.progress(Progress::indeterminate("Starting analysis"));

    // Task 1: Read OCR from VSS (already generated by Core)
    let text = ctx.task(|| async {
        paper.full_text
            .ok_or_else(|| JobError::missing_sidecar("ocr"))
    }).await?;

    ctx.check_interrupt().await?; // Checkpoint

    // Task 2: AI summarization with Jinja template
    let summary = ctx.task(|| async {
        #[derive(Serialize)]
        struct PromptCtx<'a> { title: &'a str, text: &'a str }

        ctx.ai()
            .model_preference("local_llm")
            .prompt_template("summarize.jinja") // From prompts/ directory
            .render_with(&PromptCtx {
                title: &paper.file.name(),
                text: &text,
            })?
            .generate_text()
            .await
    }).await?;

    ctx.check_interrupt().await?; // Checkpoint

    // Task 3: Save as custom tag
    ctx.task(|| async {
        ctx.vdfs()
            .add_tag(paper.file.id(), &format!("summary:{}", summary))
            .await
    }).await?;

    ctx.progress(Progress::complete("Analysis complete"));
    Ok(())
}

Maps to:

Implements Job and JobHandler traits
Dispatched via JobManager::dispatch()
Progress via ctx.progress() (emits Event::JobProgress)
Checkpoints via ctx.check_interrupt()
Persisted in jobs.db

4. The Perception Layer (How Core Prepares Data)

Core's Analysis Pipeline (Whitepaper Section 4.2.5)

After indexing a file, Core's Analysis Queueing Phase dispatches:

ThumbnailJob - Generates WebP thumbnails → VSS thumbs/{variant}.webp
OcrJob - Extracts text from PDFs/images → VSS ocr/ocr.json
EmbeddingJob - Generates vectors → VSS embeddings/{model}.json
TranscriptJob - Transcribes audio/video → VSS transcript/transcript.json
MediaAnalysisJob - Extracts EXIF, duration, codec → media_data JSON in DB

Extensions Consume, Don't Regenerate

// WRONG: Extension shouldn't do OCR (Core already did it)
#[job]
async fn analyze_paper(ctx: &JobContext, paper: Paper) -> JobResult<()> {
    let text = ctx.ai().ocr_document(&paper.file).await?; // WASTEFUL!
}

// RIGHT: Extension reads pre-computed OCR from VSS
#[job]
async fn analyze_paper(ctx: &JobContext, paper: Paper) -> JobResult<()> {
    let text = paper.full_text
        .ok_or_else(|| JobError::missing_sidecar("ocr"))?;

    // Now do extension-specific work (summarization, topic extraction, etc.)
    let summary = analyze_text(&text).await?;
}

5. Agent Memory System (Grounded)

Where Agent Memory Actually Lives

#[agent_memory]
struct ChronicleMind {
    /// TemporalMemory stored in VSS as:
    /// .sdlibrary/sidecars/extension/{app_id}/memory/history.json
    history: TemporalMemory<PaperAnalysisEvent>,

    /// AssociativeMemory stored as vector sidecar:
    /// .sdlibrary/sidecars/extension/{app_id}/memory/knowledge.vss
    knowledge: AssociativeMemory<Concept>,

    /// WorkingMemory stored in WASM instance state (ephemeral)
    /// or persisted to: .sdlibrary/sidecars/extension/{app_id}/state.json
    plan: WorkingMemory<ResearchPlan>,
}

Rich Memory Query API (Using VSS Infrastructure)

impl ChronicleMind {
    /// Query temporal memory with filters
    async fn papers_about(&self, topic: &str, days: u64) -> Vec<PaperAnalysisEvent> {
        self.history
            .query()
            .since(Duration::days(days))
            // Field filtering using serde_json path queries
            .where_field("title", contains(topic))
            // Semantic filtering using VSS embeddings
            .where_semantic("summary", similar_to(topic))
            .min_similarity(0.7)
            .sort_by_relevance()
            .limit(20)
            .collect()
            .await
            .unwrap_or_default()
    }

    /// Query associative memory (semantic graph)
    async fn concepts_related_to(&self, concept_name: &str) -> Vec<Concept> {
        self.knowledge
            // Uses VSS vector search infrastructure
            .query_similar(concept_name)
            .top_k(10)
            // Traverse concept graph
            .and_related_concepts(depth = 2)
            .collect()
            .await
            .unwrap_or_default()
    }

    /// Transactional update to working memory
    async fn update_research_plan(&mut self, new_topic: String) -> Result<()> {
        self.plan.update(|mut plan| {
            plan.priority_topics.push(new_topic);
            Ok(plan) // Atomic commit
        }).await
    }
}

Implementation Details:

TemporalMemory uses SQLite FTS5 for text queries + VSS for semantic
AssociativeMemory uses VSS vector repositories (same as semantic search)
WorkingMemory uses transactional updates with rollback on error
All persisted to .sdlibrary/sidecars/extension/{app_id}/

6. Real Examples (Grounded in Core)

Example 1: Chronicle Reads Core-Generated OCR

#[agent]
impl Chronicle {
    #[on_event(EntryCreated)]
    #[filter = ".extension() == 'pdf'"]
    async fn on_new_pdf(entry: Entry, ctx: &AgentContext<ChronicleMind>) -> AgentResult<()> {
        // Core's indexer has already:
        // 1. Created Entry in `entries` table
        // 2. Generated CAS ID (content_uuid)
        // 3. Dispatched OcrJob → created sidecar at:
        //    .sdlibrary/sidecars/content/{h0}/{h1}/{content_uuid}/ocr/ocr.json

        // Extension waits for sidecar to be ready
        ctx.on_sidecar_ready(entry.content_uuid(), "ocr", |ocr_text| async move {
            // Now do extension-specific analysis
            let topics = extract_topics(&ocr_text).await?;

            // Write tags to UserMetadata (metadata_tag junction table)
            ctx.vdfs()
                .add_tags(entry.metadata_id(), topics)
                .await?;

            // Store analysis in extension's memory (in VSS)
            ctx.memory().write().await.history.append(PaperAnalysisEvent {
                paper_id: entry.id(),
                title: entry.name().to_string(),
                summary: ocr_text.truncate(500),
            }).await?;

            Ok(())
        }).await
    }
}

Example 2: Ledger Uses Core's OCR for Receipts

#[app(id = "com.spacedrive.ledger")]
struct Ledger;

#[model]
struct Receipt {
    #[entry(filter = "*.{pdf,jpg,png}")]
    scan: Entry,

    /// Reads OCR from VSS (Core already did OCR in Analysis Queueing phase)
    #[sidecar(kind = "ocr")]
    ocr_text: Option<String>,

    /// Extension-computed fields stored in custom_fields JSON
    #[custom_field]
    merchant: Option<String>,

    #[custom_field]
    amount: Option<f64>,

    #[custom_field]
    date: Option<String>,
}

#[agent]
impl Ledger {
    #[on_sidecar_ready(kind = "ocr")]
    async fn on_ocr_ready(entry: Entry, ocr: String, ctx: &AgentContext) -> AgentResult<()> {
        // Check if OCR contains receipt patterns
        if contains_price_pattern(&ocr) {
            // Extract structured data
            let receipt_data = parse_receipt(&ocr).await?;

            // Save to Entry's custom_fields JSON in user_metadata table
            ctx.vdfs()
                .update_custom_fields(entry.metadata_id(), json!({
                    "merchant": receipt_data.merchant,
                    "amount": receipt_data.amount,
                    "date": receipt_data.date,
                }))
                .await?;

            // Add semantic tag
            ctx.vdfs()
                .add_tag(entry.metadata_id(), "#receipt")
                .await?;
        }

        Ok(())
    }
}

7. AI Integration (Grounded in VSS)

Jinja Templates for Extensions

Extensions ship with prompts/ directory:

chronicle.wasm
prompts/
  ├── summarize_paper.jinja
  └── extract_concepts.jinja

Templates rendered using VSS-stored OCR:

#[task]
async fn summarize_paper(ctx: &TaskContext, paper: &Paper) -> TaskResult<String> {
    // Paper.full_text comes from VSS sidecar (Core already did OCR)
    let text = paper.full_text
        .as_ref()
        .ok_or_else(|| TaskError::missing_sidecar("ocr"))?;

    #[derive(Serialize)]
    struct PromptCtx<'a> {
        title: &'a str,
        text: &'a str,
    }

    // Use AI with Jinja template
    let summary = ctx.ai()
        .model_preference("local_llm") // User configures in Spacedrive settings
        .prompt_template("summarize_paper.jinja")
        .render_with(&PromptCtx {
            title: &paper.file.name(),
            text,
        })?
        .generate_text()
        .await?;

    Ok(summary)
}

Maps to:

OCR sidecar from VSS at .sdlibrary/sidecars/content/{uuid}/ocr/ocr.json
AI models run locally (Ollama) or cloud (API keys in user config)
Jinja templates bundled with WASM extension

8. Memory Queries (Using Real Search Infrastructure)

TemporalMemory Uses FTS5 + VSS

#[agent_memory]
struct ChronicleMind {
    /// Stored at: .sdlibrary/sidecars/extension/chronicle/memory/history.db
    /// Uses SQLite FTS5 for temporal queries
    history: TemporalMemory<PaperAnalysisEvent>,

    /// Stored at: .sdlibrary/sidecars/extension/chronicle/memory/knowledge.vss
    /// Uses Vector Repository format (same as semantic search)
    knowledge: AssociativeMemory<Concept>,
}

impl ChronicleMind {
    async fn papers_about_ml(&self) -> Vec<PaperAnalysisEvent> {
        self.history
            .query()
            // Temporal filter (FTS5 on history.db)
            .since(Duration::days(30))
            // Text search (FTS5)
            .where_field("title", contains("machine learning"))
            // Semantic re-ranking (VSS vector search on embeddings)
            .where_semantic("summary", similar_to("neural networks"))
            .min_similarity(0.75)
            // Final results
            .sort_by_relevance()
            .limit(10)
            .collect()
            .await
            .unwrap_or_default()
    }
}

Maps to:

FTS5 virtual tables for fast temporal queries
VSS vector search for semantic similarity (uses same infrastructure as Lightning Search)
Stored in extension's own sidecar directory

9. Permission Model (Capability-Based)

Extensions Declare Permissions

#[app(
    id = "com.spacedrive.chronicle",
    permissions = [
        // Can read entries matching glob
        Permission::ReadEntries(glob = "**/*.pdf"),

        // Can read specific sidecar kinds from VSS
        Permission::ReadSidecars(kinds = ["ocr", "embeddings"]),

        // Can write to user_metadata.tags
        Permission::WriteTags,

        // Can write to user_metadata.custom_fields["chronicle"]
        Permission::WriteCustomFields(namespace = "chronicle"),

        // Can dispatch jobs
        Permission::DispatchJobs,

        // Can use AI models
        Permission::UseAI(models = ["local"]),
    ]
)]
struct Chronicle;

Runtime Enforcement

// In Core's WASM plugin host:
impl WasmPluginHost {
    fn check_permission(&self, plugin_id: &str, permission: Permission) -> Result<()> {
        let manifest = self.load_manifest(plugin_id)?;

        if !manifest.permissions.contains(&permission) {
            return Err(PermissionDenied {
                plugin: plugin_id,
                requested: permission,
            });
        }

        Ok(())
    }
}

// Every ctx operation checks permissions:
impl VdfsContext {
    pub async fn add_tag(&self, metadata_id: Uuid, tag: &str) -> Result<()> {
        // Check permission before allowing tag write
        self.host.check_permission(&self.plugin_id, Permission::WriteTags)?;

        // Execute operation
        self.db.add_tag(metadata_id, tag).await
    }
}

10. Complete Grounded Example

// Extension: Chronicle Research Assistant
// Consumes: OCR from Core's VSS, Entry metadata, Tags
// Produces: Research summaries, semantic tags, analysis in agent memory

#[app(
    id = "com.spacedrive.chronicle",
    permissions = [
        Permission::ReadEntries(glob = "**/*.pdf"),
        Permission::ReadSidecars(kinds = ["ocr", "embeddings"]),
        Permission::WriteTags,
        Permission::WriteCustomFields(namespace = "chronicle"),
        Permission::UseAI(models = ["local"]),
    ]
)]
struct Chronicle;

#[model]
struct Paper {
    #[entry] file: Entry,
    #[sidecar(kind = "ocr")] full_text: Option<String>,
    #[sidecar(kind = "embeddings")] embedding: Option<Vec<f32>>,
    #[user_metadata] tags: Vec<Tag>,
}

#[agent_memory]
struct ChronicleMind {
    history: TemporalMemory<PaperAnalysisEvent>,
    knowledge: AssociativeMemory<Concept>,
}

#[agent]
impl Chronicle {
    /// Triggered when Core emits Event::EntryCreated
    #[on_event(EntryCreated)]
    #[filter = ".extension() == 'pdf'"]
    async fn on_new_pdf(entry: Entry, ctx: &AgentContext<ChronicleMind>) -> AgentResult<()> {
        // Wait for Core's OcrJob to complete
        ctx.on_sidecar_ready(entry.content_uuid(), "ocr", |_| async move {
            // Dispatch extension's analysis job
            let paper = ctx.vdfs().get::<Paper>(entry.id()).await?;
            ctx.jobs().dispatch(analyze_paper, paper).await
        }).await
    }

    /// User queries extension via UI
    #[on_query("papers about {topic}")]
    async fn find_papers(ctx: &AgentContext<ChronicleMind>, topic: String) -> AgentResult<Vec<Paper>> {
        let memory = ctx.memory().read().await;

        // Query agent's memory using semantic search
        let relevant_events = memory.history
            .query()
            .where_semantic("summary", similar_to(&topic))
            .top_k(10)
            .collect()
            .await?;

        // Load full Paper models from VDFS
        let mut papers = Vec::new();
        for event in relevant_events {
            if let Ok(paper) = ctx.vdfs().get::<Paper>(event.paper_id).await {
                papers.push(paper);
            }
        }

        Ok(papers)
    }
}

#[task]
async fn extract_topics(ctx: &TaskContext, text: &str) -> TaskResult<Vec<String>> {
    #[derive(Serialize)]
    struct Ctx<'a> { text: &'a str }

    ctx.ai()
        .prompt_template("extract_topics.jinja")
        .render_with(&Ctx { text })?
        .generate_json::<Vec<String>>()
        .await
}

#[job]
async fn analyze_paper(ctx: &JobContext, paper: Paper) -> JobResult<()> {
    // Read Core-generated OCR from VSS
    let text = paper.full_text.ok_or(JobError::missing_sidecar("ocr"))?;

    // Extension-specific analysis
    let topics = ctx.run(extract_topics, (&text,)).await?;

    // Save to UserMetadata tags
    for topic in topics {
        ctx.vdfs().add_tag(paper.file.metadata_id(), &topic).await?;
    }

    // Store in agent memory (persisted to VSS)
    ctx.memory().write().await.history.append(PaperAnalysisEvent {
        paper_id: paper.file.id(),
        title: paper.file.name().to_string(),
        summary: text.chars().take(500).collect(),
    }).await?;

    Ok(())
}

Key Mappings Summary

SDK Concept	Spacedrive Core Reality
`#[app]`	WASM plugin loaded by `PluginManager`
`#[model]`	Wrapper around `Entry` + VSS sidecars + `UserMetadata`
`#[agent]`	Event listener on `EventBus`, memory in VSS
`#[job]`	Implements `Job` + `JobHandler`, persisted in `jobs.db`
`#[task]`	Unit of work within job, built on `task-system`
`#[sidecar]`	Reads from `.sdlibrary/sidecars/` (VSS)
`#[user_metadata]`	Reads/writes `user_metadata`, `tags`, `labels` tables
`TemporalMemory`	Uses SQLite FTS5 + VSS, stored in extension's VSS dir
`AssociativeMemory`	Uses VSS vector repositories (same as semantic search)
Permissions	Capability-based, enforced by WASM host

11. Implementation Architecture

WASM Boundary & Host Functions

Extensions run as WASM modules. The PluginManager (in CoreContext) provides host functions:

// In Spacedrive Core (Host side)
impl WasmPluginHost {
    fn expose_host_functions() -> Linker {
        linker.func_wrap("spacedrive", "vdfs_query_entries", |...| { ... });
        linker.func_wrap("spacedrive", "vdfs_read_sidecar", |...| { ... });
        linker.func_wrap("spacedrive", "vdfs_write_tag", |...| { ... });
        linker.func_wrap("spacedrive", "job_dispatch", |...| { ... });
        linker.func_wrap("spacedrive", "event_subscribe", |...| { ... });
    }
}

Extension Context (Passed Across WASM Boundary)

// Extension sees this (via FFI)
pub struct ExtensionContext {
    library_id: Uuid,
    plugin_id: String,
    permissions: PermissionSet,
}

impl ExtensionContext {
    /// Query entries (calls host function vdfs_query_entries)
    pub async fn query_entries(&self) -> QueryBuilder<Entry> {
        // Serializes query, calls WASM import, deserializes result
    }

    /// Read sidecar (calls host function vdfs_read_sidecar)
    pub async fn read_sidecar(
        &self,
        content_uuid: Uuid,
        kind: &str,
    ) -> Result<Vec<u8>> {
        // Permission check + VSS read
    }

    /// Add tag (calls host function vdfs_write_tag)
    pub async fn add_tag(&self, metadata_id: Uuid, tag: &str) -> Result<()> {
        // Permission check + DB write to metadata_tag
    }

    /// Dispatch job (calls host function job_dispatch)
    pub async fn dispatch_job<J: Job>(&self, job: J) -> Result<JobHandle> {
        // Serializes job, calls JobManager::dispatch
    }
}

Extension Memory Persistence

// Agent memory stored in library's VSS:
// .sdlibrary/sidecars/extension/{app_id}/
//   ├── memory/
//   │   ├── history.db        # TemporalMemory (SQLite FTS5)
//   │   ├── knowledge.vss      # AssociativeMemory (Vector Repository)
//   │   └── plan.json          # WorkingMemory (JSON state)
//   └── state.json             # Extension state checkpoint

#[agent_memory]
struct ChronicleMind {
    // Persisted to .sdlibrary/sidecars/extension/chronicle/memory/history.db
    history: TemporalMemory<PaperAnalysisEvent>,

    // Persisted to .sdlibrary/sidecars/extension/chronicle/memory/knowledge.vss
    knowledge: AssociativeMemory<Concept>,

    // Persisted to .sdlibrary/sidecars/extension/chronicle/memory/plan.json
    plan: WorkingMemory<ResearchPlan>,
}

Job Integration (Extensions Run Jobs Through Core)

// Extension job is serialized, passed to Core, and executed by JobManager
#[job]
async fn analyze_paper(ctx: &JobContext, paper: Paper) -> JobResult<()> {
    // ctx.library provides access to:
    // - ctx.library.db() → Database for queries
    // - ctx.library.jobs() → Dispatch child jobs
    // - ctx.library.event_bus() → Emit custom events

    // Job is persisted in library's jobs.db
    // Progress emitted via Event::JobProgress
    // On crash, resumes from last checkpoint

    ctx.progress(Progress::simple(0.0, "Reading OCR"));

    // Read from VSS (permission-checked by host)
    let text = paper.full_text.ok_or(JobError::missing_sidecar("ocr"))?;

    ctx.check_interrupt().await?; // Checkpoint to jobs.db

    ctx.progress(Progress::simple(0.5, "Analyzing"));

    let summary = ctx.task(|| async {
        ctx.ai()
            .prompt_template("summarize.jinja")
            .render_with(&json!({ "text": text }))?
            .generate_text()
            .await
    }).await?;

    ctx.check_interrupt().await?; // Checkpoint

    // Write tag (permission-checked)
    ctx.vdfs()
        .add_tag(paper.file.metadata_id(), &format!("summary:{}", summary))
        .await?;

    Ok(())
}

Maps to:

Job implements Job + JobHandler traits (core/src/infra/job/traits.rs)
Dispatched via library.jobs.dispatch() (core/src/infra/job/manager.rs)
Persisted in library's jobs.db
Uses task-system for execution and checkpointing

12. Real Data Flows

Flow 1: User Adds PDF → Extension Analyzes

1. User drops paper.pdf into Spacedrive
   ↓
2. Core IndexerJob runs (5 phases):
   - Discovery: Finds paper.pdf
   - Processing: Creates Entry in `entries` table
   - Aggregation: Updates parent directory stats
   - Content ID: Generates BLAKE3 CAS ID
   - Analysis Queueing: Dispatches OcrJob
   ↓
3. Core OcrJob runs:
   - Extracts text from PDF
   - Saves to VSS: .sdlibrary/sidecars/content/{uuid}/ocr/ocr.json
   - Updates `sidecars` table (status = "ready")
   - Emits Event::Custom { event_type: "SidecarReady", data: {...} }
   ↓
4. Extension Chronicle (WASM) receives event:
   - Reads OCR from VSS via host function
   - Runs AI summarization with Jinja template
   - Writes tags to metadata_tag table via host function
   - Stores analysis in extension's TemporalMemory (.sdlibrary/sidecars/extension/chronicle/memory/)

Flow 2: User Queries "papers about machine learning"

1. User types query in UI
   ↓
2. Chronicle extension's query handler runs:

   ctx.memory().read().await.history
       .query()
       .where_semantic("summary", similar_to("machine learning"))
       .top_k(10)
       .collect()
       .await?
   ↓
3. Under the hood:
   - TemporalMemory uses FTS5 on extension's history.db
   - Semantic filtering uses VSS vector search on knowledge.vss
   - Same infrastructure as Core's Lightning Search
   ↓
4. Results returned to UI with matched papers

13. Security Model (Grounded in WASM Sandboxing)

Permission Enforcement

// Extension declares in manifest
permissions = [
    Permission::ReadEntries(glob = "**/*.pdf"),
    Permission::ReadSidecars(kinds = ["ocr"]),
    Permission::WriteTags,
]

// Host enforces on every operation
impl WasmHost {
    fn vdfs_add_tag(&self, metadata_id: Uuid, tag: &str) -> Result<()> {
        // 1. Check permission
        if !self.plugin_permissions.contains(&Permission::WriteTags) {
            return Err(PermissionDenied);
        }

        // 2. Execute operation
        self.library.db
            .insert_tag(metadata_id, tag)
            .await
    }
}

Resource Limits

WASM provides natural sandboxing:

Memory limit: 100MB per extension (configurable)
CPU quota: Interruptible via ctx.check_interrupt()
Storage quota: Tracked in extension's VSS directory
No filesystem access: Can only read through host functions
No network access: Unless explicitly permitted

This SDK is now grounded in Spacedrive's actual architecture, mapping every concept to real systems.

27 KiB Raw Blame History

VDFS SDK Specification (Grounded in Spacedrive Core)

1. Core Architecture (What Actually Exists)

Spacedrive Core Provides:

What Extensions Get:

2. The SDK: Extensions as Data Consumers

Core Insight

3. SDK Primitives (Grounded)

#[app] - Maps to WASM Plugin Entry Point

#[model] - Custom Typed Wrapper Around Entry + Sidecars

#[agent] - Event Listener on EventBus

#[job] - Implements Job + JobHandler Traits

4. The Perception Layer (How Core Prepares Data)

Core's Analysis Pipeline (Whitepaper Section 4.2.5)

Extensions Consume, Don't Regenerate

5. Agent Memory System (Grounded)

Where Agent Memory Actually Lives

Rich Memory Query API (Using VSS Infrastructure)

6. Real Examples (Grounded in Core)

Example 1: Chronicle Reads Core-Generated OCR

Example 2: Ledger Uses Core's OCR for Receipts

7. AI Integration (Grounded in VSS)

Jinja Templates for Extensions

8. Memory Queries (Using Real Search Infrastructure)

TemporalMemory Uses FTS5 + VSS

9. Permission Model (Capability-Based)

Extensions Declare Permissions

Runtime Enforcement

10. Complete Grounded Example

Key Mappings Summary

11. Implementation Architecture

WASM Boundary & Host Functions

Extension Context (Passed Across WASM Boundary)

Extension Memory Persistence

Job Integration (Extensions Run Jobs Through Core)

12. Real Data Flows

Flow 1: User Adds PDF → Extension Analyzes

Flow 2: User Queries "papers about machine learning"

13. Security Model (Grounded in WASM Sandboxing)

Permission Enforcement

Resource Limits

27 KiB

Raw Blame History

`#[app]` - Maps to WASM Plugin Entry Point

`#[model]` - Custom Typed Wrapper Around Entry + Sidecars

`#[agent]` - Event Listener on EventBus

`#[job]` - Implements Job + JobHandler Traits