mirror of https://github.com/spacedriveapp/spacedrive.git synced 2025-12-11 20:15:30 +01:00

Jamie Pine 94a52dd2df cleanup

2025-10-11 08:48:11 -07:00

19 KiB

Raw Blame History

VDFS SDK as Programming Language

The Distributed Computing Primitive for the AI Era

Date: October 10, 2025 Vision: VDFS isn't just an API - it's the programming model for local-first, AI-native applications

The Core Insight

Current framing: "SDK provides access to data across devices"

Reality: SDK provides compute across devices, data models that sync, agents with memory, and durable execution

The VDFS is becoming the raw computing primitive for distributed applications where:

Users own the infrastructure (their devices)
Apps declare behavior, VDFS orchestrates execution
Agents coordinate across devices automatically
Data models sync by default
Everything is durable and resumable

This is like:

SQL: Declarative language for data → You declare WHAT you want, database figures out HOW
React: Declarative language for UI → You declare UI state, React figures out rendering
VDFS: Declarative language for distributed apps → You declare behavior, VDFS figures out WHERE and WHEN

What the SDK Actually Enables

1. Distributed Compute, Not Just Distributed Data

Not just: "Access files on Device A from Device B"

Actually: "Run computations on Device A, triggered from Device B, results synced automatically"

#[extension(id = "chronicle")]
struct Chronicle;

#[job]
async fn analyze_papers(ctx: &JobContext, papers: Vec<Entry>) -> Result<Analysis> {
    // This job runs on BEST available device automatically:
    // - Device with most CPU (for embeddings)
    // - Device with papers locally (avoid transfer)
    // - Device currently online and idle

    for paper in papers.progress(ctx) {
        // Heavy compute dispatched optimally
        let embedding = ctx.compute()
            .prefer_device_with_gpu()
            .embed_document(paper)?;

        // Results stored in VDFS, synced automatically
        ctx.vdfs().store_sidecar(paper, "embedding", embedding)?;
    }

    Ok(analysis)
}

The Magic: Developer never says "run on Device A" - VDFS routes execution based on:

Device capabilities (GPU, CPU, battery)
Data locality (where files are)
Network conditions (latency, bandwidth)
User preferences (privacy settings, cost limits)

2. Data Models as Declarative Schema

Not just: "Store JSON in sidecars"

Actually: "Declare data models that sync, validate, and evolve across devices"

#[data_model]
struct ResearchProject {
    #[entry(kind = "directory")]
    folder: Entry,  // Syncs as device-owned (state-based)

    #[tags]
    topics: Vec<Tag>,  // Syncs as shared (HLC-ordered log)

    #[sidecar(file = "graph.json")]
    knowledge_graph: Graph,  // Travels with Entry

    #[agent_memory]
    research_state: ResearchState,  // Persistent agent context

    #[computed]
    completion: f32,  // Derived, not stored

    #[encrypted]
    notes: String,  // Automatically encrypted at rest
}

// This becomes a LANGUAGE for describing distributed data
// Sync strategy, storage location, encryption - all declarative

The Beauty:

#[entry] = device-owned, state-synced
#[tags] = shared, HLC-synced
#[sidecar] = travels with data
#[agent_memory] = persistent across sessions
#[computed] = derived on-demand
#[encrypted] = zero-knowledge by default

Developer declares WHAT the data is, VDFS handles HOW it's stored, synced, and secured.

3. Agents as Distributed Programs

Not just: "AI that queries local data"

Actually: "Distributed programs that run across devices, coordinate, and maintain state"

#[agent]
impl Chronicle {
    // Agent lifecycle - runs on any device
    async fn init(ctx: &AgentContext) -> Result<()> {
        // Agent "wakes up" on device that needs it
        // Loads state from agent_memory (synced via VDFS)
        ctx.restore_state()?;

        // Registers for events across ALL devices
        ctx.on_new_pdf(Self::analyze_paper);
        ctx.on_voice_note(Self::transcribe_and_index);

        Ok(())
    }

    // Event handlers run WHERE the event occurred
    async fn analyze_paper(entry: Entry, ctx: &AgentContext) {
        // If triggered on Device A, runs on Device A
        // Results sync to all devices automatically

        let analysis = ctx.ai()
            .local_if_available()  // Prefer local Ollama
            .fallback_to_cloud()   // Cloud if local unavailable
            .analyze(entry)?;

        // Store analysis, syncs automatically
        ctx.memory().update("papers_analyzed", |count| count + 1)?;
    }

    // Queries can run on ANY device, results routed back
    #[query]
    async fn find_gaps(topic: String, ctx: &AgentContext) -> Vec<Paper> {
        // VDFS routes this query to device with most papers indexed
        // Results stream back to requesting device

        let my_papers = ctx.vdfs().search_local(topic)?;
        let canonical = ctx.ai().canonical_papers(topic)?;

        canonical.difference(my_papers)
    }
}

The Paradigm Shift:

Agents are distributed programs, not local scripts
State syncs automatically across devices
Execution routes to optimal device
Developer declares behavior, VDFS orchestrates

4. Cross-Device Orchestration Language

The Dream Syntax:

#[workflow]
async fn research_workflow(ctx: &WorkflowContext) {
    // Declarative multi-device workflow

    // Step 1: Ingest on mobile (where user is)
    ctx.on_device(DeviceType::Mobile)
        .capture_voice_note()
        .transcribe()
        .save_to_project("ai-safety")?;

    // Step 2: Process on desktop (where compute is)
    ctx.on_device(DeviceType::Desktop)
        .when_idle()
        .generate_embeddings()
        .update_knowledge_graph()?;

    // Step 3: Sync to NAS (where storage is)
    ctx.on_device(DeviceType::NAS)
        .when_online()
        .backup_project()
        .verify_integrity()?;

    // Step 4: Notify on ANY device user is active on
    ctx.on_active_device()
        .notify("Research updated: 3 new gaps identified")?;
}

This is choreography, not orchestration:

Each device knows its role
No central coordinator
Execution flows naturally based on device capabilities
User sees unified experience

The VDFS Language Primitives

Storage Primitives

Entry - Universal data unit (files, emails, receipts, contacts)
Sidecar - Derivative data (OCR, embeddings, thumbnails)
Tag - Semantic organization (synced, graph-based)
Collection - User groupings (albums, projects, sets)

Execution Primitives

Job - Durable, resumable background work
Action - Preview-before-execute operations
Agent - Continuous, context-aware assistants
Workflow - Multi-step orchestrations

Sync Primitives

DeviceOwned - State-based sync (filesystem index)
Shared - HLC-ordered log sync (tags, ratings)
AgentMemory - Per-extension persistent state
Ephemeral - Temporary, never synced

Compute Primitives

@local - Must run on local device
@any - Route to best available device
@prefer(gpu) - Prefer devices with capability
@fallback(cloud) - Use cloud if local unavailable

Why This Matters: The AI Era Needs This

Current reality:

Building AI apps requires: vector DB, queue system, state management, multi-device sync, authentication
Takes months, costs thousands in infrastructure
Developers rebuild the same primitives

VDFS future:

Declare data models, agents, workflows
Infrastructure provided (storage, sync, compute routing, AI models)
Launch in weeks, costs nothing in infrastructure (users own hardware)

Comparison:

Primitive	Traditional Stack	VDFS Stack
Storage	PostgreSQL + S3	VDFS (user devices)
Vector DB	Pinecone ($70/mo)	VDFS embeddings (free)
Queue	Redis + workers	Durable Jobs (free)
Sync	Firebase ($200/mo)	VDFS Sync (free)
AI	OpenAI API ($100/mo)	Ollama local (free)
Auth	Auth0 ($200/mo)	Device pairing (free)
Total	$570/month	$0/month

For 1000 users: Traditional = $6,840/year infrastructure. VDFS = $0.

The Developer Experience Vision

Current State (VDFS v2)

// Explicit, imperative
let papers = fetch_papers()?;
for paper in papers {
    let ocr = extract_text(&paper)?;
    let embedding = generate_embedding(&ocr)?;
    store_sidecar(&paper, embedding)?;
}

Future State (VDFS Language)

// Declarative, VDFS orchestrates
#[pipeline]
async fn process_papers(papers: Stream<Entry>) {
    papers
        .extract_text()          // Runs on device with paper
        .generate_embeddings()   // Runs on device with GPU
        .store_sidecars()        // Syncs to all devices
        .notify_completion()     // Notifies on active device
}

// VDFS handles:
// - Where each step runs
// - How data flows between devices
// - When to retry failures
// - How to resume interruptions

The Dream: Natural Language to VDFS

// User says: "Analyze all my PDFs from last month"

// Extension developer writes:
#[intent("analyze documents")]
async fn analyze_intent(ctx: &IntentContext, query: NaturalLanguage) -> Action {
    // VDFS parses intent
    let params = ctx.parse_temporal_query(query)?;  // "last month"
    let scope = ctx.parse_content_type(query)?;     // "PDFs"

    // VDFS executes
    ctx.vdfs()
        .entries()
        .of_type("pdf")
        .since(params.start_date)
        .analyze_with(|pdf| {
            // Runs on device with PDF + compute
            summarize(pdf)
        })
        .collect_results()
        .present_to_user()
}

What Makes This Revolutionary

1. Users Own the Infrastructure

Traditional: Apps run on vendor servers (Notion servers, Dropbox servers)
VDFS: Apps run on user devices (your laptop, your phone, your NAS)
Extension developers get enterprise infrastructure for free
Users maintain sovereignty

2. Agents Coordinate Across Devices

Chronicle agent on laptop processes heavy AI
Cipher agent on phone handles biometric unlock
Atlas agent on NAS handles backups
Ledger agent on any device with receipts extracts data
They coordinate through VDFS events and shared state

3. Data Models Sync by Default

// Developer declares structure
#[data_model]
struct Contact {
    #[shared]  // Syncs via HLC log
    name: String,
    email: String,

    #[device_owned]  // Syncs via state
    last_contacted_from: DeviceId,

    #[encrypted]  // Zero-knowledge
    notes: String,
}

// VDFS automatically:
// - Syncs name/email via HLC (shared)
// - Syncs last_contacted via state (device-owned)
// - Encrypts notes
// - Resolves conflicts
// - Maintains audit log

No sync code written. It's declarative.

4. Durable by Default

#[job]
async fn import_emails(ctx: &JobContext) {
    // This job is AUTOMATICALLY:
    // - Resumable (checkpointed every N iterations)
    // - Synced (state saved to VDFS)
    // - Distributed (can migrate between devices)
    // - Observable (progress tracked)

    for email in fetch_emails()?.progress(ctx) {
        ctx.check()?;  // Auto-checkpoint
        process_email(email)?;
    }
}

// Developer writes business logic
// VDFS provides durability primitives

The Vision: VDFS as Computing Substrate

What SQL did for Data

Before SQL: Imperative file operations, manual indexing, no query optimization After SQL: Declarative queries, automatic optimization, normalized schemas

What React did for UI

Before React: Manual DOM manipulation, imperative updates, spaghetti state After React: Declarative components, automatic rendering, clean state flow

What VDFS does for Distributed Apps

Before VDFS: Manual sync, custom protocols, fragile state management, infrastructure costs After VDFS: Declarative models, automatic sync, durable execution, zero infrastructure

The SDK as Language Design

Core Concepts

1. Everything is an Entry

// Universal abstraction
Entry::File(file)       // Traditional file
Entry::Email(email)     // From email agent
Entry::Tweet(tweet)     // From Twitter agent
Entry::Receipt(receipt) // From Ledger
Entry::Contact(contact) // From Atlas
Entry::Note(note)       // From Chronicle

// All queryable the same way
ctx.vdfs()
    .entries()
    .of_type("receipt")
    .tagged("tax-deductible")
    .since("2024-01-01")
    .sum(|r| r.amount)

2. Jobs are Declarative Workflows

#[job]
#[runs_on(prefer = "gpu", fallback = "cpu")]
#[checkpoint_every(100)]
#[retry_on_failure(max = 3)]
async fn generate_embeddings(ctx: &JobContext, entries: Vec<Entry>) {
    // Attributes declare execution policy
    // VDFS enforces it

    for entry in entries.progress(ctx) {
        ctx.check()?;  // Checkpoint happens automatically
        let embedding = embed(entry)?;  // Retry happens automatically
        ctx.store(entry, embedding)?;   // Sync happens automatically
    }
}

3. Agents are Persistent Observers

#[agent]
#[memory(persistent = true, sync = true)]
#[runs_when(device_idle = true)]
impl ResearchAssistant {
    // Agent persists across sessions
    // Memory syncs across devices
    // Runs when appropriate

    #[on_event(EntryCreated, filter = "pdf")]
    async fn on_new_paper(entry: Entry, ctx: &AgentContext) {
        // Triggered on device where PDF added
        // Can dispatch work to other devices

        ctx.dispatch_job("analyze_paper", entry)
            .on_device_with_most_compute()
            .when_idle()
            .await?;
    }

    #[on_query("what am I missing?")]
    async fn find_gaps(ctx: &AgentContext) -> Response {
        // Runs on device with query request
        // Can aggregate from all devices

        let all_papers = ctx.vdfs()
            .entries()
            .of_type("pdf")
            .across_all_devices()  // Queries federated
            .collect()?;

        analyze_gaps(all_papers, ctx.memory().research_graph())
    }
}

4. Cross-Extension Composition

// Agents communicate naturally
#[agent]
impl Chronicle {
    async fn suggest_next_reading(ctx: &AgentContext) -> Action {
        // Query Ledger agent
        let budget = ctx.call_agent("ledger", "research_budget")?;

        // Query Atlas agent
        let collaborators = ctx.call_agent("atlas", "project_team")?;

        // Combine insights
        let suggestions = self.recommend_papers(budget, collaborators)?;

        // Propose coordinated action
        ctx.propose_action(ReadingList {
            papers: suggestions,
            notify: collaborators,
            budget_impact: budget.estimate(suggestions)
        })
    }
}

Agents aren't isolated - they're collaborative programs sharing the VDFS substrate.

The Syntax Beauty

Declarative Device Selection

#[job]
#[device_selector]
async fn heavy_computation(ctx: &JobContext) {
    // VDFS language for device selection

    ctx.select_device()
        .with_capability(Capability::GPU)
        .prefer_local()
        .fallback_to_cloud(CostLimit::$0.10)
        .execute(|| {
            // Heavy AI computation
            train_model(data)
        })?;
}

Declarative Data Flow

#[pipeline]
async fn process_research(ctx: &PipelineContext) {
    ctx.stream()
        .from_device(ctx.user_device())  // Capture on phone
        .voice_notes()

        .pipe_to(ctx.device_with_most_cpu())  // Process on desktop
        .transcribe()
        .generate_embeddings()

        .pipe_to(ctx.device(DeviceRole::Storage))  // Archive on NAS
        .compress()
        .encrypt()
        .store()

        .notify(ctx.active_device(), "Research processed");  // Notify where user is
}

Declarative Sync Behavior

#[data_model]
#[sync_strategy(conflict = "union_merge", priority = "hlc")]
struct TagSet {
    tags: Vec<Tag>,
}

#[data_model]
#[sync_strategy(conflict = "last_write_wins", priority = "device_authority")]
struct FileMetadata {
    size: u64,
    modified: DateTime,
}

// Conflict resolution is DECLARED, not coded

Why This is the Future

The AI Era Needs Distributed Computing

Today's AI Apps:

Run on centralized servers
Users upload data (privacy risk)
Pay for compute (ongoing cost)
Vendor controls everything

VDFS AI Apps:

Run on user devices
Data never leaves user control
Users provide compute (zero marginal cost)
User owns the infrastructure

Local-First is the Only Sustainable Model

Traditional SaaS: Vendor pays for every user's compute/storage (15-45% margins) VDFS Apps: User provides infrastructure (95% margins)

This isn't just better margins - it's the only way AI apps can be sustainable.

Running GPT-4 for 1000 users = $10K/month in API costs. Running Ollama on user devices = $0.

The Developer Pitch

"Stop building infrastructure. Start building intelligence."

What you don't build with VDFS:

Multi-device sync (VDFS provides)
Vector database (VDFS provides)
Queue system (Durable Jobs)
Authentication (Device pairing)
Encryption (Built-in)
Backup/recovery (VDFS handles)
Offline support (VDFS native)
P2P networking (Iroh provided)

What you build:

Domain logic (research assistant behavior)
Data models (what a "project" means)
User experience (how insights are presented)
Agent intelligence (what suggestions to make)

Result: Launch in weeks, not months. Zero infrastructure costs. 95% margins.

The Architectural Poetry

// This is the entire Chronicle extension (conceptual)

#[extension(id = "chronicle")]
struct Chronicle;

#[data_model]
struct Project {
    #[entry] papers: Vec<Entry>,
    #[sidecar] graph: KnowledgeGraph,
    #[agent_memory] state: ResearchState,
}

#[agent]
impl Chronicle {
    #[on_new_entry(filter = "pdf")]
    async fn analyze(entry: Entry, ctx: &AgentContext) {
        entry.extract_text()
             .generate_embedding()
             .add_to_graph(ctx.memory().graph)?;
    }

    #[on_query("what am I missing?")]
    async fn find_gaps(ctx: &AgentContext) -> Vec<Paper> {
        ctx.memory()
           .graph
           .identify_gaps()
           .rank_by_relevance()
    }
}

// ~30 lines
// Complete distributed application
// Multi-device sync: automatic
// AI integration: declarative
// State management: handled
// Durability: built-in

This is beautiful.

For the Investor Memo

One perfect paragraph:

"The VDFS SDK is a programming language for distributed applications. Developers declare data models, agents, and workflows. The VDFS handles execution routing, multi-device sync, state persistence, and failure recovery. A password manager inherits encrypted storage and multi-device sync. An AI research tool inherits vector search and durable processing. A CRM inherits dynamic schemas and collaboration. Extension developers write domain logic in weeks, not infrastructure in months. This is the computing primitive for AI-native, local-first applications where users own the infrastructure."

This is your unfair advantage. This is why the platform will win.

19 KiB Raw Blame History