- Introduced two new benchmark recipes: `desktop_complex.yaml` and `desktop_extreme.yaml`. - `desktop_complex.yaml` simulates a realistic desktop environment with 500k files and 8 levels of directory nesting. - `desktop_extreme.yaml` targets power users with 1M files and 12 levels, featuring a comprehensive file type coverage and realistic size distribution. - Updated documentation to include details about the new benchmark recipes and their intended use cases.
19 KiB
Benchmarking Suite
This document explains how to use and extend the benchmarking suite that lives in benchmarks/. It covers concepts, CLI commands, recipe schema, data generation, scenarios, metrics, reporting, CI guidance, and troubleshooting.
Goals
- Reliable, reproducible performance evaluation of core workflows (e.g., indexing discovery, content identification).
- Modular architecture: add scenarios, reporters, and data generators without touching the core wiring.
- CI-friendly: deterministic runs, structured outputs, small quick recipes for PR checks.
Overview
-
benchmarks/is a standalone Rust crate that provides:- CLI binary:
sd-bench - Dataset generator(s):
benchmarks/src/generator/ - Scenarios:
benchmarks/src/scenarios/ - Runner & metrics:
benchmarks/src/runner/,benchmarks/src/metrics/ - Reporting:
benchmarks/src/reporting/ - Recipes (YAML):
benchmarks/recipes/ - Results (JSON):
benchmarks/results/
- CLI binary:
-
The CLI boots the core in an isolated data directory, enables job logging, creates/opens a dedicated benchmark library if needed, and orchestrates scenario execution.
Installation
- Requirements: Rust toolchain, workspace builds.
- Build the bench crate:
cargo build -p sd-bench --bin sd-bench
Quickstart
-
Generate one recipe:
cargo run -p sd-bench -- mkdata --recipe benchmarks/recipes/shape_small.yaml
-
Generate all recipes in a directory (default locations under
locations[].pathin each recipe):cargo run -p sd-bench -- mkdata-all --recipes-dir benchmarks/recipes
-
Generate datasets on an external disk without changing recipes (prefix relative recipe paths):
cargo run -p sd-bench -- mkdata-all --recipes-dir benchmarks/recipes --dataset-root /Volumes/YourHDD
-
Run one scenario with one recipe and write a JSON summary:
- Discovery:
cargo run -p sd-bench -- run --scenario indexing-discovery --recipe benchmarks/recipes/shape_small.yaml --out-json benchmarks/results/shape_small-indexing-discovery-nvme.json - Content identification:
cargo run -p sd-bench -- run --scenario content-identification --recipe benchmarks/recipes/shape_small.yaml --out-json benchmarks/results/shape_small-content-identification-nvme.json
- Discovery:
-
NEW: Run all scenarios on multiple locations with automatic hardware detection:
# Run all scenarios (discovery, aggregation, content-id) on both NVMe and HDD cargo run -p sd-bench -- run-all --locations "/tmp/benchdata" "/Volumes/Seagate/benchdata" # Run specific scenarios on multiple locations cargo run -p sd-bench -- run-all \ --scenarios indexing-discovery aggregation \ --locations "/Users/me/benchdata" "/Volumes/HDD/benchdata" "/Volumes/SSD/benchdata" # Filter to only shape recipes cargo run -p sd-bench -- run-all \ --locations "/tmp/benchdata" "/Volumes/Seagate/benchdata" \ --recipe-filter "^shape_" -
Generate CSV reports from JSON summaries:
cargo run -p sd-bench -- results-table --results-dir benchmarks/results --out benchmarks/results/whitepaper_metrics.csv --format csv
The CLI always prints a brief stdout summary and (if applicable) the path to the generated JSON. It also prints job log paths for later inspection.
Commands
mkdata --recipe <path> [--dataset-root <path>]- Generates a dataset based on a YAML recipe (see Recipe Schema below).
- With
--dataset-root, any relativelocations[].pathin the recipe is prefixed with this path (absolute paths are left unchanged). Useful for targeting an external HDD.
mkdata-all [--recipes-dir <dir>] [--dataset-root <path>] [--recipe-filter <regex>]- Scans a directory for
.yaml/.ymland runsmkdatafor each file. --dataset-rootprefixes relativelocations[].pathas above.--recipe-filterfilters recipe files by filename (regex applied to file stem), e.g.^hdd_.
- Scans a directory for
run --scenario <name> --recipe <path> [--out-json <path>] [--dataset-root <path>]- Boots an isolated core, ensures a benchmark library, adds recipe locations, waits for jobs to finish.
- Summarizes metrics to stdout; optionally writes JSON summary at
--out-json. --dataset-rootprefixes relativelocations[].pathat runtime (absolute paths untouched).
run-all [--scenarios <names...>] [--locations <paths...>] [--recipes-dir <dir>] [--out-dir <dir>] [--skip-generate] [--recipe-filter <regex>]- Enhanced for multi-location, multi-scenario benchmarking with automatic hardware detection
- Runs all combinations of scenarios × locations × recipes, automatically detecting hardware type from volume information.
--scenarios: Optional list of scenarios to run. If not specified, runs all:indexing-discovery,aggregation,content-identification.--locations: List of paths where datasets should be generated/benchmarked. Hardware type is automatically detected from the volume (e.g., NVMe, HDD, SSD).- Output files are automatically named:
{recipe}-{scenario}-{hardware}.json(e.g.,shape_small-indexing-discovery-nvme.json). - With
--skip-generate, it will not generate datasets and expects them to exist. --recipe-filterselects a subset of recipes by regex on filename stem (e.g.,^shape_for shape recipes only).- The system automatically handles the
benchdata/prefix in recipes, so you can specify/tmp/benchdataand it will create/tmp/benchdata/shape_smalletc.
Architecture
- Thin bin:
benchmarks/src/bin/sd-bench-new.rsdelegates tobenchmarks/src/cli/commands.rs. - Core modules exported via
benchmarks/src/mod_new.rs:generator/(dataset generation)scenarios/(Scenario trait implementations)runner/(orchestration & report emission)metrics/(result model and phase timings)reporting/(reporters like JSON)core_boot/(isolated core boot + job logging)recipe/(schema + validation)util/(helpers)
Recipe Schema
YAML schema (see benchmarks/recipes/*.yaml). Recipe names no longer need hardware prefixes - hardware is auto-detected. Example:
name: shape_small
seed: 12345
locations:
- path: benchdata/shape_small # Note: 'benchdata/' prefix is handled automatically
structure:
depth: 2
fanout_per_dir: 8
files:
total: 5000
size_buckets:
small: { range: [4096, 131072], share: 0.6 }
medium: { range: [1048576, 5242880], share: 0.3 }
large: { range: [5242880, 10485760], share: 0.1 }
extensions: [pdf, zip, jpg, txt]
duplicate_ratio: 0.1
content_gen:
mode: partial # zeros | partial | full
sample_block_size: 10240 # 10 KiB; aligns with content hashing sample size
magic_headers: true # write registry-derived magic bytes
media:
generate_thumbnails: false
Desktop-Scale Recipes
For testing realistic desktop scenarios, including job resumption and long-running indexing operations:
desktop_complex.yaml - Realistic desktop environment (500k files, 8 levels deep):
name: desktop_complex
seed: 42424242
locations:
- path: benchdata/desktop_complex
structure:
depth: 8 # Deep nesting like real file systems
fanout_per_dir: 25 # Many directories per level
files:
total: 500000 # Half million files - realistic desktop scale
size_buckets:
tiny: { range: [0, 4096], share: 0.25 }
small: { range: [4096, 1048576], share: 0.35 }
medium: { range: [1048576, 50000000], share: 0.25 }
large: { range: [50000000, 500000000], share: 0.10 }
huge: { range: [500000000, 4000000000], share: 0.05 }
extensions: [txt, md, pdf, jpg, png, mp4, zip, py, js, rs, # ... many more
duplicate_ratio: 0.15
content_gen:
mode: partial
sample_block_size: 10240
magic_headers: true
desktop_extreme.yaml - Power user environment (1M files, 12 levels deep):
- 1,000,000 files across 12 directory levels
- Comprehensive file type coverage (100+ extensions)
- Realistic size distribution including very large files (up to 8GB)
- 20% duplicate ratio for realistic backup/copy scenarios
Fields
name: logical recipe name.seed: RNG seed (deterministic runs). If omitted, one is derived from entropy.locations[]:path: base directory for generated files.structure.depth: max nested subdirectory depth (randomized per file up to this depth).structure.fanout_per_dir: number of subdirectory options at each level.files.total: total files per location (before duplicates).files.size_buckets: map of bucket name =>{ range: [min, max], share }; shares are normalized.files.extensions: file extension sampling pool (e.g.,[pdf, zip, jpg, txt]).files.duplicate_ratio: fraction of duplicates (hardlink, fallback to copy).files.content_gen:mode:zeros: sparse file; fast; not realistic for content identification.partial: writes header + evenly spaced samples + footer; gaps remain sparse zeros; matches content hashing sampling points.full: fills the entire file with deterministic bytes; slowest, most realistic.
sample_block_size: size of each inner sample block (default 10 KiB). Leave at 10 KiB to match the content hashing algorithm.magic_headers: if true, writes file signature patterns based on thefile_typeregistry for the chosen extension.
media(reserved for future synthetic media generation; currently optional/no-op by default).
Content Generation Details
- The generator can write content that aligns with the content hash sampling algorithm in
src/domain/content_identity.rs:- For large files (> 100 KiB):
- Includes file size (handled by the hash function).
- Hashes a header (8 KiB), 4 evenly spaced inner samples (default 10 KiB each), and a footer (8 KiB).
- For small files: full-content hashing.
- For large files (> 100 KiB):
partialmode writes the header/samples/footer only (deterministic pseudo-random bytes), leaving gaps as sparse zeros. This yields realistic, stable hashes without full writes.fullmode writes deterministic content for the entire file for maximum realism.magic_headers: trueusessd_core::file_type::FileTypeRegistryto write magic byte signatures for the chosen extension when available.
Scenarios
- Implement
Scenarioinbenchmarks/src/scenarios/and register inscenarios/registry.rs. - Built-in:
indexing-discovery: Adds locations (shallow indexing) and waits for indexing jobs to complete; collects metrics.content-identification: Runs content mode and reports content-only throughput using phase timings (excludes discovery).
Adding a scenario
- Create
benchmarks/src/scenarios/<your_scenario>.rsimplementing:name(&self) -> &'static strdescribe(&self) -> &'static strprepare(&mut self, boot: &CoreBoot, recipe: &Recipe)run(&mut self, boot: &CoreBoot, recipe: &Recipe)
- Register it in
benchmarks/src/scenarios/registry.rs.
Metrics and Phase Timing
- The indexer logs a formatted summary including phase timings (discovery, processing, content). The bench runner parses these logs (temporary approach) and produces
ScenarioResultwith:duration_s: total durationdiscovery_duration_s,processing_duration_s,content_duration_s: optional phase timings- throughput and counts (files, dirs, total size, errors)
raw_artifacts: paths to job logs
- For content-only benchmarking, use
content_duration_sto compute throughput and exclude discovery time. - Future: event-driven or structured metrics ingestion to avoid log parsing.
Reporting
- JSON reporter writes summaries into a single JSON:
benchmarks/src/reporting/json_summary.rswrites{ "runs": [ ...ScenarioResult... ] }.
- Register additional reporters in
benchmarks/src/reporting/registry.rs. - Planned: Markdown, CSV, HTML.
CSV Reports
-
After producing JSON results (e.g., via
runorrun-all), generate CSV reports:cargo run -p sd-bench -- results-table --results-dir benchmarks/results --out benchmarks/results/whitepaper_metrics.csv --format csv
-
The CSV format shows all individual benchmark runs with automatic hardware detection:
- Header:
Phase,Hardware,Files_per_s,GB_per_s,Files,Dirs,GB,Errors,Recipe - Each row represents one benchmark run
- Phase names: "Discovery" (indexing-discovery), "Processing" (aggregation), "Content Identification" (content-identification)
- Hardware labels are automatically detected from the volume where the benchmark was run (e.g., "Internal NVMe SSD", "External HDD (Seagate)")
- Results are sorted by phase, then hardware, then recipe name
- The LaTeX document reads
../benchmarks/results/whitepaper_metrics.csv
- Header:
-
Other supported formats:
--format json: Export as JSON (default)--format markdown: Generate a markdown table (useful for documentation)
Core Boot (Isolated)
- The bench boot uses its own data dir, e.g.
~/Library/Application Support/spacedrive-bench/<scenario>or the system temp dir fallback. - Job logging is enabled and sized for benchmarks. Job logs are printed after each run and are included as artifacts in results.
- A dedicated library is created/used for benchmark runs.
Key Features & Improvements
Automatic Hardware Detection
- The benchmark suite now automatically detects hardware type from the volume where benchmarks are run
- No need for hardware-specific recipe names or manual tagging
- Detects: Internal/External NVMe SSD, HDD, SSD, Network Attached Storage
- Hardware information is included in output filenames and benchmark results
Multi-Location, Multi-Scenario Execution
- Run all benchmark combinations with a single command
- Automatically generates datasets at each location if needed
- Output files are named systematically:
{recipe}-{scenario}-{hardware}.json - Example:
shape_small-indexing-discovery-nvme.json
Smart Path Handling
- The
benchdata/prefix in recipes is handled intelligently - Specify
/tmp/benchdataas location, and it creates/tmp/benchdata/shape_small(not/tmp/benchdata/benchdata/shape_small) - Works seamlessly with external drives and network volumes
Enhanced Reporting
- CSV reporter shows all individual runs (not aggregated)
- Results are sorted by phase → hardware → recipe for easy comparison
- Hardware labels are human-readable (e.g., "External HDD (Seagate)")
Best Practices
- For comprehensive benchmarking across hardware:
cargo run -p sd-bench -- run-all \ --locations "/path/to/nvme" "/Volumes/HDD" "/Volumes/SSD" \ --recipe-filter "^shape_" - For fast iteration, use smaller recipes (
shape_small.yaml) andcontent_gen.mode: partial. - For realistic content identification, set
magic_headers: trueandcontent_gen.mode: partialorfullfor a subset of files. - Keep seeds fixed in CI to avoid result variance.
CI Integration
- Add a job that runs a tiny recipe end-to-end and uploads the JSON summary artifacts (and optionally logs) for inspection.
- Suggested command:
cargo run -p sd-bench -- run --scenario indexing-discovery --recipe benchmarks/recipes/nvme_tiny.yaml --out-json benchmarks/results/ci-indexing-discovery.json
Troubleshooting
- “Files look empty / zeros”: ensure your recipe has
files.content_gendefined withmode: partialorfull, and considermagic_headers: true. - “Unknown scenario”: run with
--scenario indexing-discoveryor add your scenario toscenarios/registry.rs. - “No recipes found”: check
--recipes-dirpath and that files end with.yamlor.yml.
Extending the Suite
- Add a generator: implement
DatasetGeneratorinbenchmarks/src/generator/, register ingenerator/registry.rs. - Add a reporter: implement
Reporterinbenchmarks/src/reporting/, register inreporting/registry.rs. - Add a scenario: see the Scenarios section above.
References
- CLI entrypoint and commands:
benchmarks/src/bin/sd-bench-new.rs,benchmarks/src/cli/commands.rs - Dataset generation:
benchmarks/src/generator/filesystem.rs - Recipe schema:
benchmarks/src/recipe/schema.rs - Scenarios:
benchmarks/src/scenarios/ - Runner:
benchmarks/src/runner/mod.rs - Metrics:
benchmarks/src/metrics/mod.rs - Reporting:
benchmarks/src/reporting/ - Isolated core boot:
benchmarks/src/core_boot/mod.rs
Future Benchmarks & Roadmap
The suite is designed to grow into a comprehensive performance harness that reflects the whitepaper and system goals.
-
Indexing pipeline
- Content identification (done): measure content-only throughput using phase timings.
- Deep indexing: include thumbnail generation and metadata extraction; track throughput and error rates.
- Rescan/change detection: cold vs warm cache; latency from change to consistency.
-
File operations
- Copy throughput: large vs small files, overlap detection, progressive copy correctness; bytes/s and resource usage.
- Delete/cleanup: large tree deletion, DB cleanup cost, vacuum.
- Validation/integrity: CAS verification throughput; corruption handling.
-
Duplicates & de-duplication
- Duplicate detection: time to detect N duplicates; content-identity correctness; DB write pressure.
-
Search & querying
- (If applicable) index build time and query latency (P50/P95); warm vs cold cache comparisons.
-
Media pipeline
- Thumbnail generation: per-kind throughput; GPU/CPU offload if available.
- Metadata extraction: EXIF/FFprobe across formats.
-
Networking & transfer
- Pairing: time-to-pair and success rate under various conditions.
- Cross-device transfer: LAN/WAN throughput and latency; concurrency sweeps.
-
Volume & system
- Volume detection and tracking: discovery latency; multi-volume scaling.
- Disk type profiling: HDD vs NVMe vs network FS; impact on indexing and copy.
Data generation enhancements
- Media synthesis: small valid PNG/JPG/WebP; short MP4/AAC clips.
- Rich content sets: archives (ZIP/TAR), PDFs, docs, code, text; symlinks/permissions; nested trees.
- Change-set support: scripted add/modify/delete to exercise rescan.
- Ground-truth manifests: emitted metadata (size, hash) to validate correctness.
Metrics & telemetry
- Structured metrics export from jobs (avoid log parsing).
- System snapshot per run: CPU/RAM, disk model/FS, OS; thermal state if available.
- Resource usage: CPU%, RSS/peak, IO bytes/IOPS.
Reporting & analysis
- Markdown/CSV reporters; baseline-diff mode for regression detection.
- HTML dashboard for trend charts over time/history.
CLI ergonomics
--list-scenarios,--list-reporters; recipe filters; scenario parameters (mode, scope, concurrency).--timeout,--retries,--clean/--reuse; max parallelism; sharding.
CI integration
- PR smoke tests: tiny recipes for key scenarios; upload JSON/logs.
- Nightly heavy runs on tagged hardware; publish time-series metrics.
- Regression gates: fail PRs on significant metric regressions.