mirror of
https://github.com/spacedriveapp/spacedrive.git
synced 2025-12-11 20:15:30 +01:00
cleanup
This commit is contained in:
parent
4288dd3555
commit
73befe0f0f
@ -32,7 +32,7 @@ Organize files across multiple devices, clouds, and platforms from a single inte
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
> [!NOTE] > **Hi, Jamie here!** This is Spacedrive v2 (December 2025)—a complete ground-up rewrite.
|
> [!NOTE] **Hi, Jamie here!** This is Spacedrive v2 (December 2025)—a complete ground-up rewrite.
|
||||||
>
|
>
|
||||||
> After development of the original alpha version stopped in January this year, left with the hard lessons of the incomplete alpha, I set out to completely rebuild Spacedrive from the ground up. The first release **2.0.0-pre.1** is coming before Christmas.
|
> After development of the original alpha version stopped in January this year, left with the hard lessons of the incomplete alpha, I set out to completely rebuild Spacedrive from the ground up. The first release **2.0.0-pre.1** is coming before Christmas.
|
||||||
>
|
>
|
||||||
|
|||||||
@ -1,18 +0,0 @@
|
|||||||
{
|
|
||||||
"$schema": "https://charm.land/crush.json",
|
|
||||||
"providers": {
|
|
||||||
"lmstudio": {
|
|
||||||
"name": "LM Studio",
|
|
||||||
"base_url": "http://localhost:1234/v1/",
|
|
||||||
"type": "openai",
|
|
||||||
"models": [
|
|
||||||
{
|
|
||||||
"name": "Qwen3 30B MOE",
|
|
||||||
"id": "local-model",
|
|
||||||
"context_window": 131072,
|
|
||||||
"default_max_tokens": 20000
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@ -1 +0,0 @@
|
|||||||
1dd5b461c2ca70e1d328ece908142f6ab6f531dbfc55e320e0d2d503b0177eed
|
|
||||||
@ -1,471 +0,0 @@
|
|||||||
# Spacedrive Whitepaper Revision Plan
|
|
||||||
|
|
||||||
**Last Updated:** 2025-01-21
|
|
||||||
**Purpose:** Track architectural updates needed to align the whitepaper with the V2 implementation.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Editorial Guidelines for Updates
|
|
||||||
|
|
||||||
### Writing Style
|
|
||||||
- **Architecture-focused**: Explain WHAT systems do and WHY design decisions were made
|
|
||||||
- **No code examples**: Exception for SdPath enum (core abstraction) and one SDK example if absolutely necessary
|
|
||||||
- **No marketing language**: Avoid superlatives like "blazing fast", "revolutionary", "game-changing"
|
|
||||||
- **No fake statistics**: Only cite real benchmarks from the indexing section
|
|
||||||
- **No implementation status**: Never mention "planned", "in progress", "coming soon" - write as if complete
|
|
||||||
- **Technical precision**: Use exact terminology, avoid vague descriptions
|
|
||||||
- **Clarity over cleverness**: Straightforward explanations trump eloquent prose
|
|
||||||
|
|
||||||
### What NOT to Include
|
|
||||||
- Performance metrics beyond indexing benchmarks (we haven't measured them)
|
|
||||||
- Code listings (except SdPath and possibly one SDK example)
|
|
||||||
- Comparisons claiming "X% faster than Y" without data
|
|
||||||
- Feature timelines or roadmap speculation
|
|
||||||
- Implementation details (how it's coded vs. how it's architected)
|
|
||||||
|
|
||||||
### Format Consistency
|
|
||||||
- Use `\textbf{}` for emphasis, not italics in technical sections
|
|
||||||
- Keep Key Takeaways boxes concise (3-4 bullets max)
|
|
||||||
- Diagrams over lengthy prose where possible
|
|
||||||
- Section cross-references using `\ref{}` consistently
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Status Legend
|
|
||||||
- **CRITICAL** - Architecturally incorrect, must fix
|
|
||||||
- **MAJOR** - Missing significant architectural details
|
|
||||||
- **MINOR** - Terminology tweaks or small additions
|
|
||||||
- **REMOVE** - Content to delete or minimize
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 1: Critical Architectural Corrections
|
|
||||||
|
|
||||||
### 1. Library Sync Architecture (Section 4.5.1)
|
|
||||||
**Lines:** 1266-1318
|
|
||||||
**Problem:** Describes sync too abstractly, missing the sophisticated watermark system that makes it reliable.
|
|
||||||
|
|
||||||
**Required Changes:**
|
|
||||||
- **Per-Resource Watermark Architecture**
|
|
||||||
- Explain sync tracks progress independently per resource type (location, entry, volume, tag, etc.)
|
|
||||||
- Enables surgical recovery: only re-sync resources with detected gaps
|
|
||||||
- Prevents cross-contamination: advancing location watermark doesn't affect entry sync
|
|
||||||
|
|
||||||
- **Dual Watermark Strategy**
|
|
||||||
- **Cursor watermark**: Advances optimistically with each received record
|
|
||||||
- **Validated watermark**: Only advances after count verification passes
|
|
||||||
- On gap detection, reset cursor to validated watermark for surgical recovery
|
|
||||||
|
|
||||||
- **Integrity Validation Mechanisms**
|
|
||||||
- **Count-based gap detection**: Compare expected vs. actual record counts per resource
|
|
||||||
- **Hash-based update detection**: Aggregated hash of resource data catches missed updates
|
|
||||||
- Both run during watermark exchange between peers
|
|
||||||
|
|
||||||
- **Escalation Strategy**
|
|
||||||
- Normal flow: Incremental catch-up using watermarks
|
|
||||||
- After 5 consecutive catch-up failures: Escalate to full backfill
|
|
||||||
- Backfill completes → Reset watermarks → Return to incremental mode
|
|
||||||
|
|
||||||
- **Watermark Exchange Protocol**
|
|
||||||
- Bidirectional negotiation when devices reconnect
|
|
||||||
- Each device sends: watermarks + counts + hashes for all resources
|
|
||||||
- Peer responds with: actual counts/hashes + needs_catchup flags
|
|
||||||
- Surgical recovery initiated for mismatched resources only
|
|
||||||
|
|
||||||
**Why This Matters:**
|
|
||||||
The watermark system is why Spacedrive can efficiently sync massive libraries without full re-indexing after network interruptions. It's a key architectural innovation over naive "send everything" approaches.
|
|
||||||
|
|
||||||
**Remove:** Vague references to "efficient state-based replication" without explaining the mechanism.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 2. WASM Extension System (Section 4.9.2)
|
|
||||||
**Lines:** 2601-2655
|
|
||||||
**Problem:** Wire registry integration is incorrect - that's not the current plan. Need to focus on the actual WASM sandbox architecture.
|
|
||||||
|
|
||||||
**Required Changes:**
|
|
||||||
- **Remove:** All references to "single host function routing to Wire registry"
|
|
||||||
- **Emphasize:** WASM provides security through complete sandboxing
|
|
||||||
- **Focus on:** Capability-based permission model
|
|
||||||
- Extensions declare required permissions upfront
|
|
||||||
- Permissions: ReadEntries, WriteSidecars, UseModel, RegisterModel, DispatchJobs
|
|
||||||
- Rate limiting per extension (requests/minute)
|
|
||||||
|
|
||||||
- **Memory Systems for AI Agents**
|
|
||||||
- **TemporalMemory**: Time-ordered event stream, supports `since()` queries
|
|
||||||
- **AssociativeMemory**: Semantic similarity search, similarity threshold filtering
|
|
||||||
- **WorkingMemory**: Current state and active plans
|
|
||||||
- Agents maintain persistent knowledge across restarts
|
|
||||||
|
|
||||||
- **Event-Driven Architecture**
|
|
||||||
- `#[on_startup]`: Initialization hook
|
|
||||||
- `#[on_event(EntryCreated)]`: React to filesystem events
|
|
||||||
- `#[scheduled(cron = "...")]`: Time-based triggers
|
|
||||||
- `#[filter("...")]`: Entry filtering expressions
|
|
||||||
|
|
||||||
**Keep ONE SDK Example:**
|
|
||||||
Show Photos extension structure to illustrate event-driven agents:
|
|
||||||
```rust
|
|
||||||
#[agent]
|
|
||||||
impl Photos {
|
|
||||||
#[on_event(EntryCreated)]
|
|
||||||
#[filter(".extension().is_image()")]
|
|
||||||
pub async fn on_new_photo(entry: Entry, ctx: &AgentContext<PhotosMind>);
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Why This Matters:**
|
|
||||||
The extension architecture enables domain-specific intelligence (Photos, Finance, Organization agents) while maintaining security through sandboxing.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 3. Indexing Engine Resumability (Section 4.3)
|
|
||||||
**Lines:** 738-872
|
|
||||||
**Problem:** Describes "multi-phase" abstractly without explaining what makes jobs actually resumable.
|
|
||||||
|
|
||||||
**Required Changes:**
|
|
||||||
- **Phase Separation Rationale**
|
|
||||||
- Each phase has distinct failure modes and I/O characteristics
|
|
||||||
- Discovery: Filesystem traversal (fails on permissions)
|
|
||||||
- Processing: Database writes (fails on constraint violations)
|
|
||||||
- Aggregation: Hierarchical calculations (fails on corrupted references)
|
|
||||||
- Content ID: File hashing (fails on file locks)
|
|
||||||
|
|
||||||
- **Checkpoint Architecture**
|
|
||||||
- Jobs checkpoint after each batch (default: 1000 entries)
|
|
||||||
- State serialized with MessagePack (compact binary format)
|
|
||||||
- On crash/restart: Deserialize state → Resume from last checkpoint
|
|
||||||
- Checkpoint includes: phase, batch cursor, processed entry IDs
|
|
||||||
|
|
||||||
- **Resumability Flow**
|
|
||||||
1. Job interrupted (crash, user cancel, device offline)
|
|
||||||
2. State persisted to `jobs.db` with last completed phase
|
|
||||||
3. On restart: Load serialized state from database
|
|
||||||
4. Jump to last completed phase, skip processed entries
|
|
||||||
5. Continue from checkpoint cursor
|
|
||||||
|
|
||||||
- **Ephemeral Mode Architecture**
|
|
||||||
- In-memory Entry records for non-indexed paths
|
|
||||||
- Enables browsing external drives without permanent indexing
|
|
||||||
- Three use cases:
|
|
||||||
- Exploring removable media before adding as Location
|
|
||||||
- Remote filesystem browsing (peer device)
|
|
||||||
- "Lazy refresh" during directory navigation
|
|
||||||
|
|
||||||
**Why This Matters:**
|
|
||||||
Resumability is critical for mobile devices and large libraries where indexing can take hours and may be interrupted multiple times.
|
|
||||||
|
|
||||||
**Enhance Diagram (Fig 4.4):** Add checkpoint persistence arrows and resumability flow.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 2: Major Architectural Expansions
|
|
||||||
|
|
||||||
### 4. Content Identity Two-Tier Hashing (Section 4.2)
|
|
||||||
**Lines:** 643-735
|
|
||||||
**Problem:** Mentions "integrity hash" but doesn't explain when/why it's generated separately.
|
|
||||||
|
|
||||||
**Required Changes:**
|
|
||||||
- **Performance vs. Security Trade-off**
|
|
||||||
- Initial indexing: Only sampled hash (first 16 chars of BLAKE3)
|
|
||||||
- Enables ~100× faster indexing (58KB read vs. full file)
|
|
||||||
- Full integrity hash generated lazily by background ValidationJobs
|
|
||||||
|
|
||||||
- **Validation Architecture**
|
|
||||||
- ValidationJobs run during idle periods
|
|
||||||
- Generate complete BLAKE3 hash of entire file
|
|
||||||
- Compare against expected content_id
|
|
||||||
- Mismatch detection → Corruption alert + restoration from redundant copies
|
|
||||||
|
|
||||||
- **When Full Integrity Matters**
|
|
||||||
- Large file transfers (verify no corruption)
|
|
||||||
- Backup verification (ensure bit-perfect copy)
|
|
||||||
- Forensic analysis (cryptographic proof of content)
|
|
||||||
- Security-sensitive files (detect tampering)
|
|
||||||
|
|
||||||
**Why This Matters:**
|
|
||||||
Separating "identity" (for deduplication) from "integrity" (for verification) allows instant indexing while preserving cryptographic guarantees when needed.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 5. Action System Simulation Details (Section 4.4)
|
|
||||||
**Lines:** 945-1236
|
|
||||||
**Problem:** Describes preview/commit but not HOW simulation achieves accuracy.
|
|
||||||
|
|
||||||
**Required Changes:**
|
|
||||||
- **Index-Based Simulation Architecture**
|
|
||||||
- All predictions via SQL queries against VDFS index
|
|
||||||
- No filesystem access during preview
|
|
||||||
- Complete knowledge: Every file's size, location, relationships known
|
|
||||||
|
|
||||||
- **Content-Aware Path Resolution**
|
|
||||||
- For `SdPath::Content` operations, resolver evaluates all instances
|
|
||||||
- Cost function weighs:
|
|
||||||
- **Locality**: Local device = 0 cost (instant)
|
|
||||||
- **Network proximity**: Iroh provides real-time latency measurements
|
|
||||||
- **Storage tier**: SSD prioritized over HDD (from PhysicalClass)
|
|
||||||
- **Device availability**: Only online devices considered
|
|
||||||
- Lowest-cost path selected automatically
|
|
||||||
|
|
||||||
- **Conflict Detection Categories**
|
|
||||||
- **Storage constraints**: Calculate exact space requirements, verify availability
|
|
||||||
- **Permission violations**: Check write access before committing
|
|
||||||
- **Path conflicts**: Detect naming collisions in target directory
|
|
||||||
- **Circular references**: Prevent moving parent into descendant
|
|
||||||
- **Resource limitations**: Estimate memory/bandwidth vs. device capabilities
|
|
||||||
|
|
||||||
- **Storage Tier Warnings**
|
|
||||||
- Simulation detects PhysicalClass/LogicalClass mismatches
|
|
||||||
- Example: User marks folder as "Hot" but it's on Cold HDD
|
|
||||||
- Preview shows: "Warning: Operation targets hot location on slow archive drive"
|
|
||||||
|
|
||||||
**Why This Matters:**
|
|
||||||
The simulation engine prevents data loss and user frustration by catching problems before execution. Its power comes from having a complete index.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 6. Networking ALPN Multiplexing (Section 4.5.2)
|
|
||||||
**Lines:** 1320-1424
|
|
||||||
**Problem:** Mentions Iroh but doesn't explain why protocol consolidation matters.
|
|
||||||
|
|
||||||
**Required Changes:**
|
|
||||||
- **ALPN Protocol Multiplexing Benefits**
|
|
||||||
- Single QUIC connection per device pair
|
|
||||||
- Multiple protocols as streams: pairing, sync, file transfer, messaging
|
|
||||||
- Each protocol identified by ALPN string (e.g., "spacedrive/sync/1")
|
|
||||||
- Stream-level routing, not connection-level
|
|
||||||
|
|
||||||
- **Connection Efficiency Gains**
|
|
||||||
- Single TCP/QUIC handshake instead of N handshakes
|
|
||||||
- Shared congestion control across all operations
|
|
||||||
- Connection reuse eliminates re-establishment overhead
|
|
||||||
- Result: Sub-2-second connection establishment
|
|
||||||
|
|
||||||
- **Deterministic Connection Initiation**
|
|
||||||
- Only device with lower NodeId initiates outbound connection
|
|
||||||
- Prevents race condition: Both devices trying to connect simultaneously
|
|
||||||
- Simpler state machine: Each device knows its role
|
|
||||||
|
|
||||||
- **Pairing Security Model**
|
|
||||||
- BIP39 mnemonic codes (12 words from 256-bit secret)
|
|
||||||
- Challenge-response handshake (4 messages)
|
|
||||||
- Ed25519 signatures for authentication
|
|
||||||
- Prevents MITM during initial pairing
|
|
||||||
|
|
||||||
**Why This Matters:**
|
|
||||||
Treating all protocols as streams on one connection eliminates coordination overhead and connection races in P2P networks.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 3: Important Context Additions
|
|
||||||
|
|
||||||
### 7. Ephemeral Mode Use Cases (Section 4.1.2)
|
|
||||||
**Lines:** 393-394
|
|
||||||
**Problem:** One-sentence mention doesn't convey the architectural significance.
|
|
||||||
|
|
||||||
**Add (1 paragraph):**
|
|
||||||
- **Three Ephemeral Scenarios**
|
|
||||||
- Browsing external drives before formal indexing
|
|
||||||
- Exploring peer device filesystems remotely
|
|
||||||
- "Lazy refresh" during directory navigation
|
|
||||||
- **Architectural Benefit**: Immediate metadata capability (tagging, organizing) even for unindexed files
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 8. Lightweight Embedding Models (Section 4.7)
|
|
||||||
**Lines:** 1797-1948
|
|
||||||
**Problem:** Doesn't emphasize these are SMALL models, not LLMs.
|
|
||||||
|
|
||||||
**Clarify:**
|
|
||||||
- **Model Scale Reality**
|
|
||||||
- all-MiniLM-L6-v2: 22M parameters, 384 dimensions, 5MB model size
|
|
||||||
- NOT GPT-scale (billions of parameters)
|
|
||||||
- Specialized for semantic similarity, not text generation
|
|
||||||
|
|
||||||
- **Performance Characteristics**
|
|
||||||
- Runs efficiently on CPU (no GPU required)
|
|
||||||
- Processes thousands of files/second during indexing
|
|
||||||
- Real-time query embedding (<40ms)
|
|
||||||
|
|
||||||
**Why This Matters:**
|
|
||||||
The architecture is practical BECAUSE it doesn't require massive models or specialized hardware.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 9. Volume Classification Benefits (Section 4.6)
|
|
||||||
**Lines:** 1950-2075
|
|
||||||
**Problem:** Describes classification but not why the complexity matters.
|
|
||||||
|
|
||||||
**Add:**
|
|
||||||
- **Platform-Specific Chaos**
|
|
||||||
- macOS: APFS containers create multiple volumes from one physical drive
|
|
||||||
- Linux: Virtual filesystems (/proc, /sys, /dev) clutter mount list
|
|
||||||
- Windows: Hidden recovery partitions and system volumes
|
|
||||||
|
|
||||||
- **Auto-Tracking Intelligence**
|
|
||||||
- Filter ~10 system volumes → Show ~3 user-relevant volumes
|
|
||||||
- Present semantic names: "Primary", "External", "Network"
|
|
||||||
- Hide: System, VM, Preboot, Update partitions
|
|
||||||
|
|
||||||
**Why This Matters:**
|
|
||||||
Users see cleaned, meaningful volume lists instead of technical chaos. Reduces cognitive load.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 10. Closure Table Performance (Section 4.8 / Database)
|
|
||||||
**Lines:** 938-944
|
|
||||||
**Problem:** Mentions closure table but not the performance win.
|
|
||||||
|
|
||||||
**Add:**
|
|
||||||
- **Traditional Hierarchical Query Problem**
|
|
||||||
- Recursive CTEs: Multiple passes over data
|
|
||||||
- LIKE-based path matching: O(n) table scan
|
|
||||||
- Performance degrades with tree depth
|
|
||||||
|
|
||||||
- **Closure Table Solution**
|
|
||||||
- Pre-computed ancestor-descendant relationships
|
|
||||||
- All hierarchy queries → Single indexed join
|
|
||||||
- O(1) operations: Directory listing, size calculation, ancestor lookup
|
|
||||||
|
|
||||||
- **Trade-off**
|
|
||||||
- Additional storage: O(d × n) where d = tree depth, n = entries
|
|
||||||
- Transactional updates: Insert self-closure + inherit parent closures
|
|
||||||
- Benefit: Million-file libraries with sub-100ms hierarchy queries
|
|
||||||
|
|
||||||
**Why This Matters:**
|
|
||||||
This is why Spacedrive maintains responsiveness with massive libraries while traditional file managers slow down.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 4: Terminology & Accuracy Corrections
|
|
||||||
|
|
||||||
### 11. HLC Usage Clarification (Multiple Sections)
|
|
||||||
**Problem:** Paper sometimes implies HLC is used for all sync.
|
|
||||||
|
|
||||||
**Global Find/Replace Needed:**
|
|
||||||
- Device-owned data (entries, locations, volumes): **Timestamp-based watermarks**
|
|
||||||
- Shared data (tags, collections, user metadata): **HLC-based log**
|
|
||||||
- Be explicit about which domain uses which mechanism
|
|
||||||
- Update Section 4.5.1 sync domain table to clarify
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 12. Testing Framework Detail (Section 7)
|
|
||||||
**Lines:** 2366-2415
|
|
||||||
**Problem:** Underplays sophistication of distributed testing.
|
|
||||||
|
|
||||||
**Add:**
|
|
||||||
- **Subprocess Testing Architecture**
|
|
||||||
- Tests spawn multiple Rust processes, each simulating a device
|
|
||||||
- Environment variables control device roles (TEST_ROLE=alice)
|
|
||||||
- Real P2P communication over loopback
|
|
||||||
|
|
||||||
- **Realistic Scenarios Tested**
|
|
||||||
- Full device pairing flows with authentication
|
|
||||||
- Conflict detection and resolution
|
|
||||||
- Network interruption recovery
|
|
||||||
- Cross-device file transfers
|
|
||||||
|
|
||||||
- **Scale**: 43 integration tests validate distributed system behavior that would be impossible with unit tests alone
|
|
||||||
|
|
||||||
**Why This Matters:**
|
|
||||||
Validates the distributed system ACTUALLY works, not just individual components in isolation.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 5: Content Removal & Cleanup
|
|
||||||
|
|
||||||
### 13. Remove Unnecessary Code Listings
|
|
||||||
|
|
||||||
**Keep ONLY:**
|
|
||||||
- SdPath enum (lines 458-474) - core abstraction
|
|
||||||
- ONE SDK example for agent event handler (if needed for clarity)
|
|
||||||
|
|
||||||
**Remove:**
|
|
||||||
- Rust trait definitions (Job, JobHandler, etc.)
|
|
||||||
- SQL schema code
|
|
||||||
- File type TOML examples
|
|
||||||
- JSON format examples
|
|
||||||
- All other implementation snippets
|
|
||||||
|
|
||||||
**Reasoning:** Paper explains architecture, not implementation. Code distracts from concepts.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 14. Remove Benchmark Claims Outside Indexing
|
|
||||||
|
|
||||||
**Scan for and remove:**
|
|
||||||
- "Sub-100ms search" (not benchmarked)
|
|
||||||
- "8,500 files/sec" (only indexing is benchmarked)
|
|
||||||
- Network throughput numbers (not measured)
|
|
||||||
- Any "X% faster" comparisons without data
|
|
||||||
|
|
||||||
**Keep:**
|
|
||||||
- Table 4.1 (Indexing benchmark data) - real measurements
|
|
||||||
- Generic statements: "sub-second response times" (not specific numbers)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 6: Diagram Improvements
|
|
||||||
|
|
||||||
### 15. Sync Architecture Diagram (Section 4.5.1)
|
|
||||||
**Current:** Text-heavy explanation
|
|
||||||
**Improve:** Visual diagram showing:
|
|
||||||
- Two sync domains (device-owned vs. shared)
|
|
||||||
- Watermark exchange protocol flow
|
|
||||||
- Escalation decision tree (catch-up → backfill)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 16. Indexing Pipeline Diagram (Section 4.3)
|
|
||||||
**Current (Fig 4.4):** Basic phase flow
|
|
||||||
**Enhance:**
|
|
||||||
- Checkpoint persistence after each phase
|
|
||||||
- Resumability arrows showing restart path
|
|
||||||
- Ephemeral mode as separate branch
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Document Conventions
|
|
||||||
|
|
||||||
### When Writing Updates
|
|
||||||
1. **Start with "Why"**: Explain the problem being solved
|
|
||||||
2. **Architecture over Implementation**: Focus on WHAT and WHY, not HOW
|
|
||||||
3. **Be Precise**: Use exact technical terms, avoid vague descriptions
|
|
||||||
4. **Cross-Reference**: Link related sections with `\ref{}`
|
|
||||||
5. **Diagrams > Prose**: Visualize complex interactions when possible
|
|
||||||
|
|
||||||
### Review Checklist
|
|
||||||
- [ ] No marketing language or superlatives?
|
|
||||||
- [ ] No fake statistics or unmeasured performance claims?
|
|
||||||
- [ ] No code examples (except SdPath + maybe one SDK example)?
|
|
||||||
- [ ] Explains WHY design decisions were made?
|
|
||||||
- [ ] Technically accurate and precise?
|
|
||||||
- [ ] Consistent terminology with glossary (Appendix)?
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Priority Order for Implementation
|
|
||||||
|
|
||||||
**Week 1: Critical Fixes**
|
|
||||||
1. Section 4.5.1 - Library Sync (most architecturally wrong)
|
|
||||||
2. Section 4.9.2 - WASM Extensions (remove incorrect Wire info)
|
|
||||||
3. Section 4.3 - Indexing Resumability (missing key details)
|
|
||||||
|
|
||||||
**Week 2: Major Expansions**
|
|
||||||
4. Section 4.2 - Two-Tier Hashing
|
|
||||||
5. Section 4.4 - Simulation Engine
|
|
||||||
6. Section 4.5.2 - ALPN Multiplexing
|
|
||||||
|
|
||||||
**Week 3: Polish**
|
|
||||||
7-12. Minor additions and terminology fixes
|
|
||||||
13-14. Remove unnecessary code and fake benchmarks
|
|
||||||
15-16. Improve diagrams
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Notes
|
|
||||||
- PhysicalClass/LogicalClass: Keep in paper (simulation engine needs it)
|
|
||||||
- Extension Wire integration: Removed from plan (not current architecture)
|
|
||||||
- Benchmarks: Only indexing section has real measurements
|
|
||||||
- Writing style: Technical precision, no marketing fluff
|
|
||||||
Loading…
x
Reference in New Issue
Block a user