19 KiB
VDFS SDK as Programming Language
The Distributed Computing Primitive for the AI Era
Date: October 10, 2025 Vision: VDFS isn't just an API - it's the programming model for local-first, AI-native applications
The Core Insight
Current framing: "SDK provides access to data across devices"
Reality: SDK provides compute across devices, data models that sync, agents with memory, and durable execution
The VDFS is becoming the raw computing primitive for distributed applications where:
- Users own the infrastructure (their devices)
- Apps declare behavior, VDFS orchestrates execution
- Agents coordinate across devices automatically
- Data models sync by default
- Everything is durable and resumable
This is like:
- SQL: Declarative language for data → You declare WHAT you want, database figures out HOW
- React: Declarative language for UI → You declare UI state, React figures out rendering
- VDFS: Declarative language for distributed apps → You declare behavior, VDFS figures out WHERE and WHEN
What the SDK Actually Enables
1. Distributed Compute, Not Just Distributed Data
Not just: "Access files on Device A from Device B"
Actually: "Run computations on Device A, triggered from Device B, results synced automatically"
#[extension(id = "chronicle")]
struct Chronicle;
#[job]
async fn analyze_papers(ctx: &JobContext, papers: Vec<Entry>) -> Result<Analysis> {
// This job runs on BEST available device automatically:
// - Device with most CPU (for embeddings)
// - Device with papers locally (avoid transfer)
// - Device currently online and idle
for paper in papers.progress(ctx) {
// Heavy compute dispatched optimally
let embedding = ctx.compute()
.prefer_device_with_gpu()
.embed_document(paper)?;
// Results stored in VDFS, synced automatically
ctx.vdfs().store_sidecar(paper, "embedding", embedding)?;
}
Ok(analysis)
}
The Magic: Developer never says "run on Device A" - VDFS routes execution based on:
- Device capabilities (GPU, CPU, battery)
- Data locality (where files are)
- Network conditions (latency, bandwidth)
- User preferences (privacy settings, cost limits)
2. Data Models as Declarative Schema
Not just: "Store JSON in sidecars"
Actually: "Declare data models that sync, validate, and evolve across devices"
#[data_model]
struct ResearchProject {
#[entry(kind = "directory")]
folder: Entry, // Syncs as device-owned (state-based)
#[tags]
topics: Vec<Tag>, // Syncs as shared (HLC-ordered log)
#[sidecar(file = "graph.json")]
knowledge_graph: Graph, // Travels with Entry
#[agent_memory]
research_state: ResearchState, // Persistent agent context
#[computed]
completion: f32, // Derived, not stored
#[encrypted]
notes: String, // Automatically encrypted at rest
}
// This becomes a LANGUAGE for describing distributed data
// Sync strategy, storage location, encryption - all declarative
The Beauty:
#[entry]= device-owned, state-synced#[tags]= shared, HLC-synced#[sidecar]= travels with data#[agent_memory]= persistent across sessions#[computed]= derived on-demand#[encrypted]= zero-knowledge by default
Developer declares WHAT the data is, VDFS handles HOW it's stored, synced, and secured.
3. Agents as Distributed Programs
Not just: "AI that queries local data"
Actually: "Distributed programs that run across devices, coordinate, and maintain state"
#[agent]
impl Chronicle {
// Agent lifecycle - runs on any device
async fn init(ctx: &AgentContext) -> Result<()> {
// Agent "wakes up" on device that needs it
// Loads state from agent_memory (synced via VDFS)
ctx.restore_state()?;
// Registers for events across ALL devices
ctx.on_new_pdf(Self::analyze_paper);
ctx.on_voice_note(Self::transcribe_and_index);
Ok(())
}
// Event handlers run WHERE the event occurred
async fn analyze_paper(entry: Entry, ctx: &AgentContext) {
// If triggered on Device A, runs on Device A
// Results sync to all devices automatically
let analysis = ctx.ai()
.local_if_available() // Prefer local Ollama
.fallback_to_cloud() // Cloud if local unavailable
.analyze(entry)?;
// Store analysis, syncs automatically
ctx.memory().update("papers_analyzed", |count| count + 1)?;
}
// Queries can run on ANY device, results routed back
#[query]
async fn find_gaps(topic: String, ctx: &AgentContext) -> Vec<Paper> {
// VDFS routes this query to device with most papers indexed
// Results stream back to requesting device
let my_papers = ctx.vdfs().search_local(topic)?;
let canonical = ctx.ai().canonical_papers(topic)?;
canonical.difference(my_papers)
}
}
The Paradigm Shift:
- Agents are distributed programs, not local scripts
- State syncs automatically across devices
- Execution routes to optimal device
- Developer declares behavior, VDFS orchestrates
4. Cross-Device Orchestration Language
The Dream Syntax:
#[workflow]
async fn research_workflow(ctx: &WorkflowContext) {
// Declarative multi-device workflow
// Step 1: Ingest on mobile (where user is)
ctx.on_device(DeviceType::Mobile)
.capture_voice_note()
.transcribe()
.save_to_project("ai-safety")?;
// Step 2: Process on desktop (where compute is)
ctx.on_device(DeviceType::Desktop)
.when_idle()
.generate_embeddings()
.update_knowledge_graph()?;
// Step 3: Sync to NAS (where storage is)
ctx.on_device(DeviceType::NAS)
.when_online()
.backup_project()
.verify_integrity()?;
// Step 4: Notify on ANY device user is active on
ctx.on_active_device()
.notify("Research updated: 3 new gaps identified")?;
}
This is choreography, not orchestration:
- Each device knows its role
- No central coordinator
- Execution flows naturally based on device capabilities
- User sees unified experience
The VDFS Language Primitives
Storage Primitives
Entry- Universal data unit (files, emails, receipts, contacts)Sidecar- Derivative data (OCR, embeddings, thumbnails)Tag- Semantic organization (synced, graph-based)Collection- User groupings (albums, projects, sets)
Execution Primitives
Job- Durable, resumable background workAction- Preview-before-execute operationsAgent- Continuous, context-aware assistantsWorkflow- Multi-step orchestrations
Sync Primitives
DeviceOwned- State-based sync (filesystem index)Shared- HLC-ordered log sync (tags, ratings)AgentMemory- Per-extension persistent stateEphemeral- Temporary, never synced
Compute Primitives
@local- Must run on local device@any- Route to best available device@prefer(gpu)- Prefer devices with capability@fallback(cloud)- Use cloud if local unavailable
Why This Matters: The AI Era Needs This
Current reality:
- Building AI apps requires: vector DB, queue system, state management, multi-device sync, authentication
- Takes months, costs thousands in infrastructure
- Developers rebuild the same primitives
VDFS future:
- Declare data models, agents, workflows
- Infrastructure provided (storage, sync, compute routing, AI models)
- Launch in weeks, costs nothing in infrastructure (users own hardware)
Comparison:
| Primitive | Traditional Stack | VDFS Stack |
|---|---|---|
| Storage | PostgreSQL + S3 | VDFS (user devices) |
| Vector DB | Pinecone ($70/mo) | VDFS embeddings (free) |
| Queue | Redis + workers | Durable Jobs (free) |
| Sync | Firebase ($200/mo) | VDFS Sync (free) |
| AI | OpenAI API ($100/mo) | Ollama local (free) |
| Auth | Auth0 ($200/mo) | Device pairing (free) |
| Total | $570/month | $0/month |
For 1000 users: Traditional = $6,840/year infrastructure. VDFS = $0.
The Developer Experience Vision
Current State (VDFS v2)
// Explicit, imperative
let papers = fetch_papers()?;
for paper in papers {
let ocr = extract_text(&paper)?;
let embedding = generate_embedding(&ocr)?;
store_sidecar(&paper, embedding)?;
}
Future State (VDFS Language)
// Declarative, VDFS orchestrates
#[pipeline]
async fn process_papers(papers: Stream<Entry>) {
papers
.extract_text() // Runs on device with paper
.generate_embeddings() // Runs on device with GPU
.store_sidecars() // Syncs to all devices
.notify_completion() // Notifies on active device
}
// VDFS handles:
// - Where each step runs
// - How data flows between devices
// - When to retry failures
// - How to resume interruptions
The Dream: Natural Language to VDFS
// User says: "Analyze all my PDFs from last month"
// Extension developer writes:
#[intent("analyze documents")]
async fn analyze_intent(ctx: &IntentContext, query: NaturalLanguage) -> Action {
// VDFS parses intent
let params = ctx.parse_temporal_query(query)?; // "last month"
let scope = ctx.parse_content_type(query)?; // "PDFs"
// VDFS executes
ctx.vdfs()
.entries()
.of_type("pdf")
.since(params.start_date)
.analyze_with(|pdf| {
// Runs on device with PDF + compute
summarize(pdf)
})
.collect_results()
.present_to_user()
}
What Makes This Revolutionary
1. Users Own the Infrastructure
- Traditional: Apps run on vendor servers (Notion servers, Dropbox servers)
- VDFS: Apps run on user devices (your laptop, your phone, your NAS)
- Extension developers get enterprise infrastructure for free
- Users maintain sovereignty
2. Agents Coordinate Across Devices
- Chronicle agent on laptop processes heavy AI
- Cipher agent on phone handles biometric unlock
- Atlas agent on NAS handles backups
- Ledger agent on any device with receipts extracts data
- They coordinate through VDFS events and shared state
3. Data Models Sync by Default
// Developer declares structure
#[data_model]
struct Contact {
#[shared] // Syncs via HLC log
name: String,
email: String,
#[device_owned] // Syncs via state
last_contacted_from: DeviceId,
#[encrypted] // Zero-knowledge
notes: String,
}
// VDFS automatically:
// - Syncs name/email via HLC (shared)
// - Syncs last_contacted via state (device-owned)
// - Encrypts notes
// - Resolves conflicts
// - Maintains audit log
No sync code written. It's declarative.
4. Durable by Default
#[job]
async fn import_emails(ctx: &JobContext) {
// This job is AUTOMATICALLY:
// - Resumable (checkpointed every N iterations)
// - Synced (state saved to VDFS)
// - Distributed (can migrate between devices)
// - Observable (progress tracked)
for email in fetch_emails()?.progress(ctx) {
ctx.check()?; // Auto-checkpoint
process_email(email)?;
}
}
// Developer writes business logic
// VDFS provides durability primitives
The Vision: VDFS as Computing Substrate
What SQL did for Data
Before SQL: Imperative file operations, manual indexing, no query optimization After SQL: Declarative queries, automatic optimization, normalized schemas
What React did for UI
Before React: Manual DOM manipulation, imperative updates, spaghetti state After React: Declarative components, automatic rendering, clean state flow
What VDFS does for Distributed Apps
Before VDFS: Manual sync, custom protocols, fragile state management, infrastructure costs After VDFS: Declarative models, automatic sync, durable execution, zero infrastructure
The SDK as Language Design
Core Concepts
1. Everything is an Entry
// Universal abstraction
Entry::File(file) // Traditional file
Entry::Email(email) // From email agent
Entry::Tweet(tweet) // From Twitter agent
Entry::Receipt(receipt) // From Ledger
Entry::Contact(contact) // From Atlas
Entry::Note(note) // From Chronicle
// All queryable the same way
ctx.vdfs()
.entries()
.of_type("receipt")
.tagged("tax-deductible")
.since("2024-01-01")
.sum(|r| r.amount)
2. Jobs are Declarative Workflows
#[job]
#[runs_on(prefer = "gpu", fallback = "cpu")]
#[checkpoint_every(100)]
#[retry_on_failure(max = 3)]
async fn generate_embeddings(ctx: &JobContext, entries: Vec<Entry>) {
// Attributes declare execution policy
// VDFS enforces it
for entry in entries.progress(ctx) {
ctx.check()?; // Checkpoint happens automatically
let embedding = embed(entry)?; // Retry happens automatically
ctx.store(entry, embedding)?; // Sync happens automatically
}
}
3. Agents are Persistent Observers
#[agent]
#[memory(persistent = true, sync = true)]
#[runs_when(device_idle = true)]
impl ResearchAssistant {
// Agent persists across sessions
// Memory syncs across devices
// Runs when appropriate
#[on_event(EntryCreated, filter = "pdf")]
async fn on_new_paper(entry: Entry, ctx: &AgentContext) {
// Triggered on device where PDF added
// Can dispatch work to other devices
ctx.dispatch_job("analyze_paper", entry)
.on_device_with_most_compute()
.when_idle()
.await?;
}
#[on_query("what am I missing?")]
async fn find_gaps(ctx: &AgentContext) -> Response {
// Runs on device with query request
// Can aggregate from all devices
let all_papers = ctx.vdfs()
.entries()
.of_type("pdf")
.across_all_devices() // Queries federated
.collect()?;
analyze_gaps(all_papers, ctx.memory().research_graph())
}
}
4. Cross-Extension Composition
// Agents communicate naturally
#[agent]
impl Chronicle {
async fn suggest_next_reading(ctx: &AgentContext) -> Action {
// Query Ledger agent
let budget = ctx.call_agent("ledger", "research_budget")?;
// Query Atlas agent
let collaborators = ctx.call_agent("atlas", "project_team")?;
// Combine insights
let suggestions = self.recommend_papers(budget, collaborators)?;
// Propose coordinated action
ctx.propose_action(ReadingList {
papers: suggestions,
notify: collaborators,
budget_impact: budget.estimate(suggestions)
})
}
}
Agents aren't isolated - they're collaborative programs sharing the VDFS substrate.
The Syntax Beauty
Declarative Device Selection
#[job]
#[device_selector]
async fn heavy_computation(ctx: &JobContext) {
// VDFS language for device selection
ctx.select_device()
.with_capability(Capability::GPU)
.prefer_local()
.fallback_to_cloud(CostLimit::$0.10)
.execute(|| {
// Heavy AI computation
train_model(data)
})?;
}
Declarative Data Flow
#[pipeline]
async fn process_research(ctx: &PipelineContext) {
ctx.stream()
.from_device(ctx.user_device()) // Capture on phone
.voice_notes()
.pipe_to(ctx.device_with_most_cpu()) // Process on desktop
.transcribe()
.generate_embeddings()
.pipe_to(ctx.device(DeviceRole::Storage)) // Archive on NAS
.compress()
.encrypt()
.store()
.notify(ctx.active_device(), "Research processed"); // Notify where user is
}
Declarative Sync Behavior
#[data_model]
#[sync_strategy(conflict = "union_merge", priority = "hlc")]
struct TagSet {
tags: Vec<Tag>,
}
#[data_model]
#[sync_strategy(conflict = "last_write_wins", priority = "device_authority")]
struct FileMetadata {
size: u64,
modified: DateTime,
}
// Conflict resolution is DECLARED, not coded
Why This is the Future
The AI Era Needs Distributed Computing
Today's AI Apps:
- Run on centralized servers
- Users upload data (privacy risk)
- Pay for compute (ongoing cost)
- Vendor controls everything
VDFS AI Apps:
- Run on user devices
- Data never leaves user control
- Users provide compute (zero marginal cost)
- User owns the infrastructure
Local-First is the Only Sustainable Model
Traditional SaaS: Vendor pays for every user's compute/storage (15-45% margins) VDFS Apps: User provides infrastructure (95% margins)
This isn't just better margins - it's the only way AI apps can be sustainable.
Running GPT-4 for 1000 users = $10K/month in API costs. Running Ollama on user devices = $0.
The Developer Pitch
"Stop building infrastructure. Start building intelligence."
What you don't build with VDFS:
- Multi-device sync (VDFS provides)
- Vector database (VDFS provides)
- Queue system (Durable Jobs)
- Authentication (Device pairing)
- Encryption (Built-in)
- Backup/recovery (VDFS handles)
- Offline support (VDFS native)
- P2P networking (Iroh provided)
What you build:
- Domain logic (research assistant behavior)
- Data models (what a "project" means)
- User experience (how insights are presented)
- Agent intelligence (what suggestions to make)
Result: Launch in weeks, not months. Zero infrastructure costs. 95% margins.
The Architectural Poetry
// This is the entire Chronicle extension (conceptual)
#[extension(id = "chronicle")]
struct Chronicle;
#[data_model]
struct Project {
#[entry] papers: Vec<Entry>,
#[sidecar] graph: KnowledgeGraph,
#[agent_memory] state: ResearchState,
}
#[agent]
impl Chronicle {
#[on_new_entry(filter = "pdf")]
async fn analyze(entry: Entry, ctx: &AgentContext) {
entry.extract_text()
.generate_embedding()
.add_to_graph(ctx.memory().graph)?;
}
#[on_query("what am I missing?")]
async fn find_gaps(ctx: &AgentContext) -> Vec<Paper> {
ctx.memory()
.graph
.identify_gaps()
.rank_by_relevance()
}
}
// ~30 lines
// Complete distributed application
// Multi-device sync: automatic
// AI integration: declarative
// State management: handled
// Durability: built-in
This is beautiful.
For the Investor Memo
One perfect paragraph:
"The VDFS SDK is a programming language for distributed applications. Developers declare data models, agents, and workflows. The VDFS handles execution routing, multi-device sync, state persistence, and failure recovery. A password manager inherits encrypted storage and multi-device sync. An AI research tool inherits vector search and durable processing. A CRM inherits dynamic schemas and collaboration. Extension developers write domain logic in weeks, not infrastructure in months. This is the computing primitive for AI-native, local-first applications where users own the infrastructure."
This is your unfair advantage. This is why the platform will win.