spacedrive/docs/design/VDFS_SDK_AS_LANGUAGE.md
2025-10-11 08:48:11 -07:00

19 KiB

VDFS SDK as Programming Language

The Distributed Computing Primitive for the AI Era

Date: October 10, 2025 Vision: VDFS isn't just an API - it's the programming model for local-first, AI-native applications


The Core Insight

Current framing: "SDK provides access to data across devices"

Reality: SDK provides compute across devices, data models that sync, agents with memory, and durable execution

The VDFS is becoming the raw computing primitive for distributed applications where:

  • Users own the infrastructure (their devices)
  • Apps declare behavior, VDFS orchestrates execution
  • Agents coordinate across devices automatically
  • Data models sync by default
  • Everything is durable and resumable

This is like:

  • SQL: Declarative language for data → You declare WHAT you want, database figures out HOW
  • React: Declarative language for UI → You declare UI state, React figures out rendering
  • VDFS: Declarative language for distributed apps → You declare behavior, VDFS figures out WHERE and WHEN

What the SDK Actually Enables

1. Distributed Compute, Not Just Distributed Data

Not just: "Access files on Device A from Device B"

Actually: "Run computations on Device A, triggered from Device B, results synced automatically"

#[extension(id = "chronicle")]
struct Chronicle;

#[job]
async fn analyze_papers(ctx: &JobContext, papers: Vec<Entry>) -> Result<Analysis> {
    // This job runs on BEST available device automatically:
    // - Device with most CPU (for embeddings)
    // - Device with papers locally (avoid transfer)
    // - Device currently online and idle

    for paper in papers.progress(ctx) {
        // Heavy compute dispatched optimally
        let embedding = ctx.compute()
            .prefer_device_with_gpu()
            .embed_document(paper)?;

        // Results stored in VDFS, synced automatically
        ctx.vdfs().store_sidecar(paper, "embedding", embedding)?;
    }

    Ok(analysis)
}

The Magic: Developer never says "run on Device A" - VDFS routes execution based on:

  • Device capabilities (GPU, CPU, battery)
  • Data locality (where files are)
  • Network conditions (latency, bandwidth)
  • User preferences (privacy settings, cost limits)

2. Data Models as Declarative Schema

Not just: "Store JSON in sidecars"

Actually: "Declare data models that sync, validate, and evolve across devices"

#[data_model]
struct ResearchProject {
    #[entry(kind = "directory")]
    folder: Entry,  // Syncs as device-owned (state-based)

    #[tags]
    topics: Vec<Tag>,  // Syncs as shared (HLC-ordered log)

    #[sidecar(file = "graph.json")]
    knowledge_graph: Graph,  // Travels with Entry

    #[agent_memory]
    research_state: ResearchState,  // Persistent agent context

    #[computed]
    completion: f32,  // Derived, not stored

    #[encrypted]
    notes: String,  // Automatically encrypted at rest
}

// This becomes a LANGUAGE for describing distributed data
// Sync strategy, storage location, encryption - all declarative

The Beauty:

  • #[entry] = device-owned, state-synced
  • #[tags] = shared, HLC-synced
  • #[sidecar] = travels with data
  • #[agent_memory] = persistent across sessions
  • #[computed] = derived on-demand
  • #[encrypted] = zero-knowledge by default

Developer declares WHAT the data is, VDFS handles HOW it's stored, synced, and secured.

3. Agents as Distributed Programs

Not just: "AI that queries local data"

Actually: "Distributed programs that run across devices, coordinate, and maintain state"

#[agent]
impl Chronicle {
    // Agent lifecycle - runs on any device
    async fn init(ctx: &AgentContext) -> Result<()> {
        // Agent "wakes up" on device that needs it
        // Loads state from agent_memory (synced via VDFS)
        ctx.restore_state()?;

        // Registers for events across ALL devices
        ctx.on_new_pdf(Self::analyze_paper);
        ctx.on_voice_note(Self::transcribe_and_index);

        Ok(())
    }

    // Event handlers run WHERE the event occurred
    async fn analyze_paper(entry: Entry, ctx: &AgentContext) {
        // If triggered on Device A, runs on Device A
        // Results sync to all devices automatically

        let analysis = ctx.ai()
            .local_if_available()  // Prefer local Ollama
            .fallback_to_cloud()   // Cloud if local unavailable
            .analyze(entry)?;

        // Store analysis, syncs automatically
        ctx.memory().update("papers_analyzed", |count| count + 1)?;
    }

    // Queries can run on ANY device, results routed back
    #[query]
    async fn find_gaps(topic: String, ctx: &AgentContext) -> Vec<Paper> {
        // VDFS routes this query to device with most papers indexed
        // Results stream back to requesting device

        let my_papers = ctx.vdfs().search_local(topic)?;
        let canonical = ctx.ai().canonical_papers(topic)?;

        canonical.difference(my_papers)
    }
}

The Paradigm Shift:

  • Agents are distributed programs, not local scripts
  • State syncs automatically across devices
  • Execution routes to optimal device
  • Developer declares behavior, VDFS orchestrates

4. Cross-Device Orchestration Language

The Dream Syntax:

#[workflow]
async fn research_workflow(ctx: &WorkflowContext) {
    // Declarative multi-device workflow

    // Step 1: Ingest on mobile (where user is)
    ctx.on_device(DeviceType::Mobile)
        .capture_voice_note()
        .transcribe()
        .save_to_project("ai-safety")?;

    // Step 2: Process on desktop (where compute is)
    ctx.on_device(DeviceType::Desktop)
        .when_idle()
        .generate_embeddings()
        .update_knowledge_graph()?;

    // Step 3: Sync to NAS (where storage is)
    ctx.on_device(DeviceType::NAS)
        .when_online()
        .backup_project()
        .verify_integrity()?;

    // Step 4: Notify on ANY device user is active on
    ctx.on_active_device()
        .notify("Research updated: 3 new gaps identified")?;
}

This is choreography, not orchestration:

  • Each device knows its role
  • No central coordinator
  • Execution flows naturally based on device capabilities
  • User sees unified experience

The VDFS Language Primitives

Storage Primitives

  • Entry - Universal data unit (files, emails, receipts, contacts)
  • Sidecar - Derivative data (OCR, embeddings, thumbnails)
  • Tag - Semantic organization (synced, graph-based)
  • Collection - User groupings (albums, projects, sets)

Execution Primitives

  • Job - Durable, resumable background work
  • Action - Preview-before-execute operations
  • Agent - Continuous, context-aware assistants
  • Workflow - Multi-step orchestrations

Sync Primitives

  • DeviceOwned - State-based sync (filesystem index)
  • Shared - HLC-ordered log sync (tags, ratings)
  • AgentMemory - Per-extension persistent state
  • Ephemeral - Temporary, never synced

Compute Primitives

  • @local - Must run on local device
  • @any - Route to best available device
  • @prefer(gpu) - Prefer devices with capability
  • @fallback(cloud) - Use cloud if local unavailable

Why This Matters: The AI Era Needs This

Current reality:

  • Building AI apps requires: vector DB, queue system, state management, multi-device sync, authentication
  • Takes months, costs thousands in infrastructure
  • Developers rebuild the same primitives

VDFS future:

  • Declare data models, agents, workflows
  • Infrastructure provided (storage, sync, compute routing, AI models)
  • Launch in weeks, costs nothing in infrastructure (users own hardware)

Comparison:

Primitive Traditional Stack VDFS Stack
Storage PostgreSQL + S3 VDFS (user devices)
Vector DB Pinecone ($70/mo) VDFS embeddings (free)
Queue Redis + workers Durable Jobs (free)
Sync Firebase ($200/mo) VDFS Sync (free)
AI OpenAI API ($100/mo) Ollama local (free)
Auth Auth0 ($200/mo) Device pairing (free)
Total $570/month $0/month

For 1000 users: Traditional = $6,840/year infrastructure. VDFS = $0.


The Developer Experience Vision

Current State (VDFS v2)

// Explicit, imperative
let papers = fetch_papers()?;
for paper in papers {
    let ocr = extract_text(&paper)?;
    let embedding = generate_embedding(&ocr)?;
    store_sidecar(&paper, embedding)?;
}

Future State (VDFS Language)

// Declarative, VDFS orchestrates
#[pipeline]
async fn process_papers(papers: Stream<Entry>) {
    papers
        .extract_text()          // Runs on device with paper
        .generate_embeddings()   // Runs on device with GPU
        .store_sidecars()        // Syncs to all devices
        .notify_completion()     // Notifies on active device
}

// VDFS handles:
// - Where each step runs
// - How data flows between devices
// - When to retry failures
// - How to resume interruptions

The Dream: Natural Language to VDFS

// User says: "Analyze all my PDFs from last month"

// Extension developer writes:
#[intent("analyze documents")]
async fn analyze_intent(ctx: &IntentContext, query: NaturalLanguage) -> Action {
    // VDFS parses intent
    let params = ctx.parse_temporal_query(query)?;  // "last month"
    let scope = ctx.parse_content_type(query)?;     // "PDFs"

    // VDFS executes
    ctx.vdfs()
        .entries()
        .of_type("pdf")
        .since(params.start_date)
        .analyze_with(|pdf| {
            // Runs on device with PDF + compute
            summarize(pdf)
        })
        .collect_results()
        .present_to_user()
}

What Makes This Revolutionary

1. Users Own the Infrastructure

  • Traditional: Apps run on vendor servers (Notion servers, Dropbox servers)
  • VDFS: Apps run on user devices (your laptop, your phone, your NAS)
  • Extension developers get enterprise infrastructure for free
  • Users maintain sovereignty

2. Agents Coordinate Across Devices

  • Chronicle agent on laptop processes heavy AI
  • Cipher agent on phone handles biometric unlock
  • Atlas agent on NAS handles backups
  • Ledger agent on any device with receipts extracts data
  • They coordinate through VDFS events and shared state

3. Data Models Sync by Default

// Developer declares structure
#[data_model]
struct Contact {
    #[shared]  // Syncs via HLC log
    name: String,
    email: String,

    #[device_owned]  // Syncs via state
    last_contacted_from: DeviceId,

    #[encrypted]  // Zero-knowledge
    notes: String,
}

// VDFS automatically:
// - Syncs name/email via HLC (shared)
// - Syncs last_contacted via state (device-owned)
// - Encrypts notes
// - Resolves conflicts
// - Maintains audit log

No sync code written. It's declarative.

4. Durable by Default

#[job]
async fn import_emails(ctx: &JobContext) {
    // This job is AUTOMATICALLY:
    // - Resumable (checkpointed every N iterations)
    // - Synced (state saved to VDFS)
    // - Distributed (can migrate between devices)
    // - Observable (progress tracked)

    for email in fetch_emails()?.progress(ctx) {
        ctx.check()?;  // Auto-checkpoint
        process_email(email)?;
    }
}

// Developer writes business logic
// VDFS provides durability primitives

The Vision: VDFS as Computing Substrate

What SQL did for Data

Before SQL: Imperative file operations, manual indexing, no query optimization After SQL: Declarative queries, automatic optimization, normalized schemas

What React did for UI

Before React: Manual DOM manipulation, imperative updates, spaghetti state After React: Declarative components, automatic rendering, clean state flow

What VDFS does for Distributed Apps

Before VDFS: Manual sync, custom protocols, fragile state management, infrastructure costs After VDFS: Declarative models, automatic sync, durable execution, zero infrastructure


The SDK as Language Design

Core Concepts

1. Everything is an Entry

// Universal abstraction
Entry::File(file)       // Traditional file
Entry::Email(email)     // From email agent
Entry::Tweet(tweet)     // From Twitter agent
Entry::Receipt(receipt) // From Ledger
Entry::Contact(contact) // From Atlas
Entry::Note(note)       // From Chronicle

// All queryable the same way
ctx.vdfs()
    .entries()
    .of_type("receipt")
    .tagged("tax-deductible")
    .since("2024-01-01")
    .sum(|r| r.amount)

2. Jobs are Declarative Workflows

#[job]
#[runs_on(prefer = "gpu", fallback = "cpu")]
#[checkpoint_every(100)]
#[retry_on_failure(max = 3)]
async fn generate_embeddings(ctx: &JobContext, entries: Vec<Entry>) {
    // Attributes declare execution policy
    // VDFS enforces it

    for entry in entries.progress(ctx) {
        ctx.check()?;  // Checkpoint happens automatically
        let embedding = embed(entry)?;  // Retry happens automatically
        ctx.store(entry, embedding)?;   // Sync happens automatically
    }
}

3. Agents are Persistent Observers

#[agent]
#[memory(persistent = true, sync = true)]
#[runs_when(device_idle = true)]
impl ResearchAssistant {
    // Agent persists across sessions
    // Memory syncs across devices
    // Runs when appropriate

    #[on_event(EntryCreated, filter = "pdf")]
    async fn on_new_paper(entry: Entry, ctx: &AgentContext) {
        // Triggered on device where PDF added
        // Can dispatch work to other devices

        ctx.dispatch_job("analyze_paper", entry)
            .on_device_with_most_compute()
            .when_idle()
            .await?;
    }

    #[on_query("what am I missing?")]
    async fn find_gaps(ctx: &AgentContext) -> Response {
        // Runs on device with query request
        // Can aggregate from all devices

        let all_papers = ctx.vdfs()
            .entries()
            .of_type("pdf")
            .across_all_devices()  // Queries federated
            .collect()?;

        analyze_gaps(all_papers, ctx.memory().research_graph())
    }
}

4. Cross-Extension Composition

// Agents communicate naturally
#[agent]
impl Chronicle {
    async fn suggest_next_reading(ctx: &AgentContext) -> Action {
        // Query Ledger agent
        let budget = ctx.call_agent("ledger", "research_budget")?;

        // Query Atlas agent
        let collaborators = ctx.call_agent("atlas", "project_team")?;

        // Combine insights
        let suggestions = self.recommend_papers(budget, collaborators)?;

        // Propose coordinated action
        ctx.propose_action(ReadingList {
            papers: suggestions,
            notify: collaborators,
            budget_impact: budget.estimate(suggestions)
        })
    }
}

Agents aren't isolated - they're collaborative programs sharing the VDFS substrate.


The Syntax Beauty

Declarative Device Selection

#[job]
#[device_selector]
async fn heavy_computation(ctx: &JobContext) {
    // VDFS language for device selection

    ctx.select_device()
        .with_capability(Capability::GPU)
        .prefer_local()
        .fallback_to_cloud(CostLimit::$0.10)
        .execute(|| {
            // Heavy AI computation
            train_model(data)
        })?;
}

Declarative Data Flow

#[pipeline]
async fn process_research(ctx: &PipelineContext) {
    ctx.stream()
        .from_device(ctx.user_device())  // Capture on phone
        .voice_notes()

        .pipe_to(ctx.device_with_most_cpu())  // Process on desktop
        .transcribe()
        .generate_embeddings()

        .pipe_to(ctx.device(DeviceRole::Storage))  // Archive on NAS
        .compress()
        .encrypt()
        .store()

        .notify(ctx.active_device(), "Research processed");  // Notify where user is
}

Declarative Sync Behavior

#[data_model]
#[sync_strategy(conflict = "union_merge", priority = "hlc")]
struct TagSet {
    tags: Vec<Tag>,
}

#[data_model]
#[sync_strategy(conflict = "last_write_wins", priority = "device_authority")]
struct FileMetadata {
    size: u64,
    modified: DateTime,
}

// Conflict resolution is DECLARED, not coded

Why This is the Future

The AI Era Needs Distributed Computing

Today's AI Apps:

  • Run on centralized servers
  • Users upload data (privacy risk)
  • Pay for compute (ongoing cost)
  • Vendor controls everything

VDFS AI Apps:

  • Run on user devices
  • Data never leaves user control
  • Users provide compute (zero marginal cost)
  • User owns the infrastructure

Local-First is the Only Sustainable Model

Traditional SaaS: Vendor pays for every user's compute/storage (15-45% margins) VDFS Apps: User provides infrastructure (95% margins)

This isn't just better margins - it's the only way AI apps can be sustainable.

Running GPT-4 for 1000 users = $10K/month in API costs. Running Ollama on user devices = $0.


The Developer Pitch

"Stop building infrastructure. Start building intelligence."

What you don't build with VDFS:

  • Multi-device sync (VDFS provides)
  • Vector database (VDFS provides)
  • Queue system (Durable Jobs)
  • Authentication (Device pairing)
  • Encryption (Built-in)
  • Backup/recovery (VDFS handles)
  • Offline support (VDFS native)
  • P2P networking (Iroh provided)

What you build:

  • Domain logic (research assistant behavior)
  • Data models (what a "project" means)
  • User experience (how insights are presented)
  • Agent intelligence (what suggestions to make)

Result: Launch in weeks, not months. Zero infrastructure costs. 95% margins.


The Architectural Poetry

// This is the entire Chronicle extension (conceptual)

#[extension(id = "chronicle")]
struct Chronicle;

#[data_model]
struct Project {
    #[entry] papers: Vec<Entry>,
    #[sidecar] graph: KnowledgeGraph,
    #[agent_memory] state: ResearchState,
}

#[agent]
impl Chronicle {
    #[on_new_entry(filter = "pdf")]
    async fn analyze(entry: Entry, ctx: &AgentContext) {
        entry.extract_text()
             .generate_embedding()
             .add_to_graph(ctx.memory().graph)?;
    }

    #[on_query("what am I missing?")]
    async fn find_gaps(ctx: &AgentContext) -> Vec<Paper> {
        ctx.memory()
           .graph
           .identify_gaps()
           .rank_by_relevance()
    }
}

// ~30 lines
// Complete distributed application
// Multi-device sync: automatic
// AI integration: declarative
// State management: handled
// Durability: built-in

This is beautiful.


For the Investor Memo

One perfect paragraph:

"The VDFS SDK is a programming language for distributed applications. Developers declare data models, agents, and workflows. The VDFS handles execution routing, multi-device sync, state persistence, and failure recovery. A password manager inherits encrypted storage and multi-device sync. An AI research tool inherits vector search and durable processing. A CRM inherits dynamic schemas and collaboration. Extension developers write domain logic in weeks, not infrastructure in months. This is the computing primitive for AI-native, local-first applications where users own the infrastructure."


This is your unfair advantage. This is why the platform will win.