remove stray test snapshots directory

This commit is contained in:
Jamie Pine 2025-12-01 16:54:56 -08:00
parent 94c4b3d11e
commit 79226a835e
18 changed files with 0 additions and 19632 deletions

View File

@ -1,52 +0,0 @@
# Phase Snapshot Analysis
## Event File Structure (from events_Content.json)
```json
{
"id": "01f866cf-ab4a-474b-800d-6e10f1445aae",
"name": "TUPLE_VARIANTS_IMPLEMENTED",
"sd_path": {
"Content": {
"content_id": "3e457574-0b4a-51e8-80ae-ad05f78193d7"
}
},
"content_identity": "3e457574-0b4a-51e8-80ae-ad05f78193d7"
}
```
## Database Entry Structure (from db_entries_sample.json)
```json
{
"entry_id": 8,
"entry_uuid": "26977785-98eb-4de7-a77b-dc63daef44b3",
"name": "Screen Recording 2025-11-10 at 2.42.22 AM",
"content_id_fk": 1
}
```
## Key Findings
**Event Files:**
- `id` = entry UUID (e.g., `"01f866cf-ab4a-474b-800d-6e10f1445aae"`)
- `sd_path` = Content path with `content_id`
- `content_identity.uuid` matches `sd_path.Content.content_id`
**Database Entries:**
- `entry_uuid` = UUID stored in entry table
- `content_id_fk` = Foreign key to content_identities table (integer)
**Directory Query (expected):**
- Should use `entry_uuid` as File `id`
- Should use Physical `sd_path` like `/Users/jamespine/Desktop/file.txt`
## The Problem
Events and queries should have matching IDs (both use `entry_uuid`), but:
1. Event `sd_path` = Content type
2. Query `sd_path` = Physical type
This means path-based filtering can't work, but ID-based matching SHOULD work if both use the same UUID.
Check the actual directory listing query to confirm it uses `entry_uuid` as the File `id`.

View File

@ -1,76 +0,0 @@
# FINAL ANALYSIS - Normalized Cache Issue
## Test Results Summary
**Test:** Indexed Desktop (235 files total) with Content mode
### Phase Event Breakdown
| Phase | Files Emitted | Content Identity |
|-------|---------------|------------------|
| Discovery | 100 | All have content_identity |
| Content | 135 | All have content_identity |
### Critical Finding: ZERO OVERLAP
**Discovery files vs Content files:** 0 files appear in both phases
This means:
- Discovery phase emits events for files 1-100
- Content phase emits events for files 101-235 (different subset)
- **They emit for completely different files**
## Why This Breaks Normalized Cache
### Scenario: User viewing /Downloads with 30 files
**Step 1: Initial load**
- Query returns 30 files with IDs: `[A, B, C, ..., Z]`
- Cache populated
**Step 2: Discovery phase events arrive**
- Batch #1: 100 files with IDs `[file001, file002, ..., file100]`
- Check: Do any of these IDs match `[A, B, C, ..., Z]`?
- Answer: **Maybe 5-10 files** if they're from /Downloads
- Result: Update 5 files, ignore 95
**Step 3: Content phase events arrive**
- Batch #1: 100 files with IDs `[file101, file102, ..., file200]`
- These are DIFFERENT files from Discovery
- Check: Do any match `[A, B, C, ..., Z]`?
- Answer: **Maybe 10 files** if from /Downloads
- Result: Update 10 files
### The Problem
Files get emitted in DIFFERENT batches across phases:
- File "document.pdf" emitted in Discovery phase
- File "photo.jpg" emitted in Content phase
- BUT they might be in the SAME directory
When Content phase events arrive, they DON'T include "document.pdf" again.
So if Discovery event had incomplete data, it never gets fixed!
## Data Confirms: Events Have Complete Data
From snapshots:
- Discovery: 100/100 files have content_identity
- Content: 135/135 files have content_identity
**Events are NOT missing content_identity!**
## The Real Bug
The logs show "Updated 0 items" which means:
1. Event batch arrives with 100 files
2. Current directory has 30 files
3. **ZERO IDs match** between event and directory
4. Nothing gets updated
But you said content_identity disappears...
Let me check if maybe the resourceFilter is REJECTING all files, causing React Query to have an empty cache.
## Next Step
Check the resourceFilter logic - it might be rejecting ALL files, leaving an empty array.

View File

@ -1,45 +0,0 @@
# Phase Snapshot Analysis - FINDINGS
## Confirmed: IDs DO MATCH!
Event file with name "3AE4AB29-5EC8-4DF9-80F8-F72AE5C38FBF":
- Event ID: `0303b9df-3b11-49a6-beb6-0fdec577fefb`
- DB entry_uuid: `0303b9df-3b11-49a6-beb6-0fdec577fefb`
**Conclusion:** Events and database entries use the SAME UUID for File.id
## The Real Problem
If IDs match, why doesn't the normalized cache work?
Looking at the frontend logs from earlier:
```
Sample existing ID: "a72dfd9e-1908-4ddd-aa44-37458e8455ae"
Sample incoming ID: "0093bc71-a000-49f3-83f1-d1f72428665e"
WRAPPED: Updated 0 items
```
These IDs are different! This means:
1. The directory listing query returned files with certain UUIDs
2. The events arrived with DIFFERENT UUIDs (different files entirely)
3. No overlap = no matches = "Updated 0 items"
## Root Cause Theory
The events contain files from ALL OVER Desktop (100+ files per batch).
The directory listing only has ~30 files from the CURRENT directory.
Most batches have 0 overlap with the current directory, so:
- Updated 0 items (correct - those files aren't in this directory)
- But without proper filtering, the lag still happens from processing all events
## Solution
We need the `resourceFilter` to work, but it needs to handle Content paths.
The filter should check:
1. Does this file's `content_identity.uuid` match ANY file in my current directory query?
2. If yes → this file belongs here, update it
3. If no → ignore it (it's from a different directory)
This is what the current resourceFilter tries to do - match by content_id.

View File

@ -1,157 +0,0 @@
# Normalized Cache Flow Analysis
## Current Implementation Flow
### 1. Initial Query Load
```typescript
// User navigates to /Downloads
const directoryQuery = useNormalizedCache({
wireMethod: "query:files.directory_listing",
input: { path: "/Downloads", ... },
resourceType: "file",
resourceFilter: (file) => { /* content_id matching logic */ }
})
```
**Result:** Query returns 30 files with:
- `id` = entry.uuid (e.g., `"a72dfd9e-1908-..."`)
- `sd_path` = Physical path
- `content_identity.uuid` = content UUID
**TanStack Query cache now contains:**
```json
{
"files": [
{
"id": "a72dfd9e-1908-...",
"name": "document.pdf",
"content_identity": { "uuid": "xyz123..." }
},
// ... 29 more files
]
}
```
---
### 2. Indexing Starts - Events Arrive
**Event arrives:**
```json
ResourceChangedBatch {
resource_type: "file",
resources: [
{
"id": "different-uuid-not-in-downloads",
"sd_path": { "Content": { "content_id": "abc456..." } },
"content_identity": { "uuid": "abc456..." }
},
// ... 99 more files from various directories
]
}
```
---
### 3. Event Processing (lines 194-322)
```typescript
queryClient.setQueryData((oldData) => {
// oldData = { files: [30 items from /Downloads] }
const resourceMap = new Map(resources.map(r => [r.id, r]));
// Map has 100 items with IDs from the event
const array = [...oldData.files]; // 30 items
// STEP A: Try to update existing items (lines 275-296)
for (let i = 0; i < 30; i++) {
const item = array[i]; // File from /Downloads
if (resourceMap.has(item.id)) {
// Does event batch contain THIS file's ID?
// Usually NO - event has files from other directories
// So this rarely executes
}
}
// Result: updateCount = 0 (no matches)
// STEP B: Try to append new items (lines 309-315)
if (resourceFilter) {
for (const resource of resources) { // 100 event files
if (!seenIds.has(resource.id) && resourceFilter(resource)) {
// resourceFilter checks: is this file's content_id in oldData?
// Problem: resourceFilter accesses directoryQuery.data
// But we're INSIDE the setQueryData callback!
// directoryQuery.data might be stale!
array.push(resource);
}
}
}
return { files: array };
});
```
---
## THE BUG
**Line 116 in context.tsx:**
```typescript
const currentFiles = directoryQuery.data?.files || [];
```
This creates a **closure problem**:
1. resourceFilter is defined with `directoryQuery.data` reference
2. When event arrives, resourceFilter runs INSIDE `setQueryData`
3. But `directoryQuery.data` still has the OLD data at this point
4. React Query hasn't updated the hook's `data` property yet
5. So resourceFilter is comparing against stale data!
**Even worse:** The resourceFilter runs for EVERY batch (100 files × many batches).
Each time it checks if 100 files match the stale `directoryQuery.data`.
---
## Why Content Identity Disappears
Looking at your logs:
- "Updated 100 items" sometimes
- "Updated 0 items" other times
When "Updated 100 items":
- Those 100 event files happened to have IDs matching files in /Downloads
- `mergeWithoutNulls()` runs
- BUT: `mergeWithoutNulls()` does shallow merge!
- Line 14: `const merged = { ...incoming }`
- This REPLACES the entire object with incoming data
- Only top-level null fields get preserved
**The Problem:**
If incoming file has `content_identity: { uuid: "xyz", content_hash: "abc" }`
And existing has `content_identity: { uuid: "xyz", content_hash: "abc", extra_field: "value" }`
The merge does:
```javascript
merged = { ...incoming } // Start with incoming
// Then fix top-level nulls
if (incoming.content_identity === null && existing.content_identity !== null) {
merged.content_identity = existing.content_identity
}
```
But if BOTH have content_identity (not null), no preservation happens!
The incoming object REPLACES the existing one entirely.
---
## Summary
1. **resourceFilter** has stale data closure issue
2. **mergeWithoutNulls** only preserves null→value, not value→different-value
3. **No filtering works** because paths don't match (Content vs Physical)
4. Every batch processes all 100 files, checking stale data
5. When IDs match by chance, merge doesn't preserve nested fields properly

View File

@ -1,150 +0,0 @@
# ROOT CAUSE IDENTIFIED - Missing Icons During Indexing
## The Bug
During indexing, video files (and other media) lose their proper icons and fall back to the generic "Document" icon. The icons are restored when thumbnails are generated, but for files without thumbnails (like videos with thumbnail generation disabled), the icons remain wrong.
## What We Thought It Was
- content_identity disappearing (normalized cache bug)
- React not re-rendering
- TanStack Query not updating
- Event system not working
## What It Actually Is
**The `content_kind` field is hardcoded to "unknown" in event data**
## The Evidence
### From Browser Logs
All 4 videos render with complete data:
```
Screen Recording 2025-11-09 at 7.18.50 PM:
has_content_identity: true
content_identity_uuid: "06358a76-0974-50c9-a939-70b70a910a91"
sidecars_count: 0
```
### From TanStack Query Devtools
Final state shows all videos have `content_identity` populated with all fields present.
### From Event Snapshots (test_snapshots/)
```json
{
"content_identity": {
"uuid": "...",
"content_hash": "...",
"kind": "unknown" // THIS IS THE PROBLEM
}
}
```
## How The Icon System Works
**Thumb.tsx line 66-67:**
```typescript
const kindCapitalized = file.content_identity?.kind
? file.content_identity.kind.charAt(0).toUpperCase() + file.content_identity.kind.slice(1)
: "Document";
const icon = getIcon(kindCapitalized, true, file.extension, ...);
```
**When kind = "unknown":**
- Capitalizes to "Unknown"
- `getIcon("Unknown", ...)` returns generic document icon
- Videos, images, etc. all show document icon
**When kind = "video":**
- Capitalizes to "Video"
- `getIcon("Video", ...)` returns video icon
- Correct icon shows
## The Root Cause Code
**File:** `core/src/domain/file.rs:354`
```rust
file.content_identity = Some(ContentIdentity {
uuid: ci_uuid,
content_hash: ci.content_hash.clone(),
integrity_hash: ci.integrity_hash.clone(),
mime_type_id: ci.mime_type_id,
kind: ContentKind::Unknown, // TODO: Load from content_kinds table
total_size: ci.total_size,
entry_count: ci.entry_count,
first_seen_at: ci.first_seen_at,
last_verified_at: ci.last_verified_at,
text_content: ci.text_content.clone(),
});
```
**The TODO comment says it all:** "Load from content_kinds table"
## Why Thumbnails "Fix" It
When thumbnail events arrive, they trigger a re-render and the thumbnail image loads, hiding the icon entirely. So you don't notice the wrong icon anymore. But the underlying data is still wrong - `kind` is still "unknown".
## The Fix
In `File::from_entry_uuids()`, we need to:
1. Load the `content_kind` for each `content_identity`
2. Join with the `content_kinds` table
3. Map the `kind_id` to `ContentKind` enum
4. Set the correct kind instead of `ContentKind::Unknown`
**The database already has this data** - we just need to query it:
```rust
// Load content kinds
let content_kinds = content_kind::Entity::find()
.filter(content_kind::Column::Id.is_in(
content_identities.iter().map(|ci| ci.kind_id)
))
.all(db)
.await?;
let kind_by_id: HashMap<i32, ContentKind> = content_kinds
.into_iter()
.map(|ck| (ck.id, ContentKind::from_id(ck.id)))
.collect();
// Then when building ContentIdentity:
kind: kind_by_id.get(&ci.kind_id).copied().unwrap_or(ContentKind::Unknown),
```
## Why The Normalized Cache Actually Works Perfectly
The entire investigation proved the normalized cache is working correctly:
Events are emitted with complete data (all 100% have content_identity)
IDs match between events and queries (entry.uuid)
Deep merge preserves existing data correctly
Filter matches files by content_id successfully
TanStack Query updates atomically
React re-renders when cache changes
Components receive updated props
The ONLY bug: `content_kind` is hardcoded to "unknown" in event data, causing wrong icons.
## Test Results Summary
- **phase_Discovery.json**: 100 files, all with `kind: "unknown"`
- **phase_Content.json**: 135 files, all with `kind: "unknown"`
- **db_entries_all.json**: Database has 235 entries with proper content_id foreign keys
The database HAS the correct content_kind data. The directory listing query probably loads it correctly (TODO: verify). But `File::from_entry_uuids()` doesn't load it, so events have incomplete kind information.
## Next Steps
1. Add content_kind join to `File::from_entry_uuids()`
2. Map kind_id to ContentKind enum
3. Test that icons appear correctly during indexing
4. Remove all the debug logging we added
That's it. One small database join fixes everything.

View File

@ -1,104 +0,0 @@
#!/bin/bash
echo "=============================================="
echo "TEMPORAL PHASE ANALYSIS"
echo "=============================================="
echo ""
# Check Discovery phase
if [ -f "phase_Discovery.json" ]; then
echo "DISCOVERY PHASE:"
DISCOVERY_COUNT=$(jq '.total_files' phase_Discovery.json)
echo " Total files: $DISCOVERY_COUNT"
echo " Sample file structure:"
jq '.all_files[0] | {id, name, sd_path_type: (.sd_path | keys[0]), has_content_identity: (.content_identity != null)}' phase_Discovery.json
echo ""
fi
# Check Content phase
if [ -f "phase_Content.json" ]; then
echo "CONTENT PHASE:"
CONTENT_COUNT=$(jq '.total_files' phase_Content.json)
echo " Total files: $CONTENT_COUNT"
echo " Sample file structure:"
jq '.all_files[0] | {id, name, sd_path_type: (.sd_path | keys[0]), has_content_identity: (.content_identity != null)}' phase_Content.json
echo ""
echo " Content identity breakdown:"
WITH_CONTENT=$(jq '[.all_files[] | select(.content_identity != null)] | length' phase_Content.json)
WITHOUT_CONTENT=$(jq '[.all_files[] | select(.content_identity == null)] | length' phase_Content.json)
echo " - With content_identity: $WITH_CONTENT"
echo " - Without content_identity: $WITHOUT_CONTENT"
echo ""
fi
# Compare IDs between phases
echo "=============================================="
echo "ID CONSISTENCY CHECK"
echo "=============================================="
echo ""
if [ -f "phase_Discovery.json" ] && [ -f "phase_Content.json" ]; then
DISC_ID=$(jq -r '.all_files[0].id' phase_Discovery.json)
CONT_ID=$(jq -r '.all_files[0].id' phase_Content.json)
echo "First file ID in Discovery: $DISC_ID"
echo "First file ID in Content: $CONT_ID"
# Check if same file appears in both phases
DISC_NAME=$(jq -r '.all_files[0].name' phase_Discovery.json)
CONT_FILE=$(jq -r --arg name "$DISC_NAME" '.all_files[] | select(.name == $name) | .id' phase_Content.json | head -1)
if [ -n "$CONT_FILE" ]; then
echo ""
echo "File '$DISC_NAME' found in both phases"
echo " Discovery ID: $DISC_ID"
echo " Content ID: $CONT_FILE"
if [ "$DISC_ID" == "$CONT_FILE" ]; then
echo " IDs MATCH"
else
echo " IDs DIFFERENT"
fi
fi
fi
echo ""
echo "=============================================="
echo "DATABASE ENTRY COMPARISON"
echo "=============================================="
echo ""
# Check if event IDs exist in database
if [ -f "phase_Content.json" ] && [ -f "db_entries_all.json" ]; then
EVENT_ID=$(jq -r '.all_files[0].id' phase_Content.json)
EVENT_NAME=$(jq -r '.all_files[0].name' phase_Content.json)
echo "Checking if event file exists in database:"
echo " Event file name: $EVENT_NAME"
echo " Event file ID: $EVENT_ID"
echo ""
DB_ENTRY=$(jq --arg id "$EVENT_ID" '.[] | select(.entry_uuid == $id)' db_entries_all.json)
if [ -n "$DB_ENTRY" ]; then
echo " FOUND in database:"
echo "$DB_ENTRY" | jq '{entry_id, entry_uuid, name}'
else
echo " NOT FOUND in database"
echo " Searching by name instead..."
DB_BY_NAME=$(jq --arg name "$EVENT_NAME" '.[] | select(.name == $name)' db_entries_all.json)
if [ -n "$DB_BY_NAME" ]; then
echo " Found by name:"
echo "$DB_BY_NAME" | jq '{entry_id, entry_uuid, name}'
fi
fi
fi
echo ""
echo "=============================================="
echo "FILES CREATED - Check these for full details:"
echo "=============================================="
ls -lh *.json *.md 2>/dev/null | awk '{print " " $9 " (" $5 ")"}'
echo ""

View File

@ -1,42 +0,0 @@
#!/bin/bash
echo "Checking file overlap between phases..."
echo ""
# Get all IDs from each phase
DISCOVERY_IDS=$(jq -r '.all_files[].id' phase_Discovery.json | sort)
CONTENT_IDS=$(jq -r '.all_files[].id' phase_Content.json | sort)
DISCOVERY_COUNT=$(echo "$DISCOVERY_IDS" | wc -l | tr -d ' ')
CONTENT_COUNT=$(echo "$CONTENT_IDS" | wc -l | tr -d ' ')
echo "Discovery phase: $DISCOVERY_COUNT files"
echo "Content phase: $CONTENT_COUNT files"
echo ""
# Find overlapping IDs
OVERLAP=$(comm -12 <(echo "$DISCOVERY_IDS") <(echo "$CONTENT_IDS") | wc -l | tr -d ' ')
echo "Files appearing in BOTH phases: $OVERLAP"
echo ""
if [ "$OVERLAP" -gt 0 ]; then
echo "Some files appear in both phases"
echo ""
echo "Sample overlapping file:"
OVERLAP_ID=$(comm -12 <(echo "$DISCOVERY_IDS") <(echo "$CONTENT_IDS") | head -1)
echo "ID: $OVERLAP_ID"
echo ""
echo "In Discovery:"
jq --arg id "$OVERLAP_ID" '.all_files[] | select(.id == $id) | {name, has_content_identity: (.content_identity != null)}' phase_Discovery.json
echo ""
echo "In Content:"
jq --arg id "$OVERLAP_ID" '.all_files[] | select(.id == $id) | {name, has_content_identity: (.content_identity != null)}' phase_Content.json
else
echo "NO files appear in both phases"
echo " Discovery emits one set of files"
echo " Content emits a completely different set"
echo " This is expected if they're different batches/directories"
fi

View File

@ -1,38 +0,0 @@
#!/bin/bash
echo "Checking TOTAL UNIQUE files across all phases..."
echo ""
# Get all unique IDs from Discovery
DISCOVERY_IDS=$(jq -r '.all_files[].id' phase_Discovery.json | sort -u)
# Get all unique IDs from Content
CONTENT_IDS=$(jq -r '.all_files[].id' phase_Content.json | sort -u)
# Combine and count unique
ALL_IDS=$(echo -e "$DISCOVERY_IDS\n$CONTENT_IDS" | sort -u)
TOTAL_UNIQUE=$(echo "$ALL_IDS" | wc -l | tr -d ' ')
DISCOVERY_COUNT=$(echo "$DISCOVERY_IDS" | wc -l | tr -d ' ')
CONTENT_COUNT=$(echo "$CONTENT_IDS" | wc -l | tr -d ' ')
echo "Discovery phase: $DISCOVERY_COUNT unique file IDs"
echo "Content phase: $CONTENT_COUNT unique file IDs"
echo "Total unique: $TOTAL_UNIQUE file IDs"
echo ""
echo "Database entries: 235"
echo ""
if [ "$TOTAL_UNIQUE" -eq 235 ]; then
echo "Total unique matches DB count!"
echo " Discovery + Content = exactly 235 files"
echo " They are DIFFERENT files (no duplicates)"
elif [ "$TOTAL_UNIQUE" -lt 235 ]; then
echo "Total unique ($TOTAL_UNIQUE) < 235"
echo " Some files appear in BOTH phases"
OVERLAP=$((DISCOVERY_COUNT + CONTENT_COUNT - TOTAL_UNIQUE))
echo " Overlap: $OVERLAP files"
else
echo " Total unique ($TOTAL_UNIQUE) > 235"
echo " This shouldn't happen - more files in events than DB?"
fi

View File

@ -1,122 +0,0 @@
#!/bin/bash
echo "=============================================="
echo "DETAILED STRUCTURE COMPARISON"
echo "=============================================="
echo ""
# Get first file from each phase
echo "DISCOVERY PHASE - First File:"
echo "================================"
jq '.all_files[0]' phase_Discovery.json > /tmp/discovery_file.json
cat /tmp/discovery_file.json | jq '.'
echo ""
echo "CONTENT PHASE - First File:"
echo "================================"
jq '.all_files[0]' phase_Content.json > /tmp/content_file.json
cat /tmp/content_file.json | jq '.'
echo ""
echo "=============================================="
echo "FIELD-BY-FIELD COMPARISON"
echo "=============================================="
echo ""
# Compare each field
echo "Field: id"
DISC_ID=$(jq -r '.id' /tmp/discovery_file.json)
CONT_ID=$(jq -r '.id' /tmp/content_file.json)
echo " Discovery: $DISC_ID"
echo " Content: $CONT_ID"
echo ""
echo "Field: sd_path"
echo " Discovery:"
jq '.sd_path' /tmp/discovery_file.json
echo " Content:"
jq '.sd_path' /tmp/content_file.json
echo ""
echo "Field: content_identity"
echo " Discovery:"
jq '.content_identity | if . == null then "NULL" else ("Present - uuid: " + .uuid) end' /tmp/discovery_file.json
echo " Content:"
jq '.content_identity | if . == null then "NULL" else ("Present - uuid: " + .uuid) end' /tmp/content_file.json
echo ""
echo "Field: sidecars"
DISC_SIDECARS=$(jq '.sidecars | length' /tmp/discovery_file.json)
CONT_SIDECARS=$(jq '.sidecars | length' /tmp/content_file.json)
echo " Discovery: $DISC_SIDECARS sidecars"
echo " Content: $CONT_SIDECARS sidecars"
echo ""
echo "=============================================="
echo "SCHEMA DIFFERENCES"
echo "=============================================="
echo ""
echo "All top-level fields in Discovery file:"
jq 'keys | sort' /tmp/discovery_file.json
echo ""
echo "All top-level fields in Content file:"
jq 'keys | sort' /tmp/content_file.json
echo ""
echo "Checking if field sets are identical..."
DISC_FIELDS=$(jq -r 'keys | sort | join(",")' /tmp/discovery_file.json)
CONT_FIELDS=$(jq -r 'keys | sort | join(",")' /tmp/content_file.json)
if [ "$DISC_FIELDS" == "$CONT_FIELDS" ]; then
echo "Both phases have IDENTICAL field sets"
else
echo "Different fields between phases!"
echo ""
echo "Fields only in Discovery:"
comm -23 <(jq -r 'keys[]' /tmp/discovery_file.json | sort) <(jq -r 'keys[]' /tmp/content_file.json | sort)
echo ""
echo "Fields only in Content:"
comm -13 <(jq -r 'keys[]' /tmp/discovery_file.json | sort) <(jq -r 'keys[]' /tmp/content_file.json | sort)
fi
echo ""
echo "=============================================="
echo "CONTENT_IDENTITY DEEP COMPARISON"
echo "=============================================="
echo ""
echo "Discovery content_identity fields:"
jq '.content_identity | if . != null then keys | sort else "null" end' /tmp/discovery_file.json
echo ""
echo "Content content_identity fields:"
jq '.content_identity | if . != null then keys | sort else "null" end' /tmp/content_file.json
echo ""
echo "Are content_identity structures identical?"
DISC_CI_FIELDS=$(jq -r '.content_identity | if . != null then (keys | sort | join(",")) else "null" end' /tmp/discovery_file.json)
CONT_CI_FIELDS=$(jq -r '.content_identity | if . != null then (keys | sort | join(",")) else "null" end' /tmp/content_file.json)
if [ "$DISC_CI_FIELDS" == "$CONT_CI_FIELDS" ]; then
echo "YES - identical structure"
else
echo "NO - different structure"
echo "Discovery: $DISC_CI_FIELDS"
echo "Content: $CONT_CI_FIELDS"
fi
echo ""
echo "=============================================="
echo "SAMPLE: Complete File Structure Comparison"
echo "=============================================="
echo ""
# Show side-by-side comparison of a complete file structure
echo "Discovery file saved to: /tmp/discovery_file.json"
echo "Content file saved to: /tmp/content_file.json"
echo ""
echo "Use: diff /tmp/discovery_file.json /tmp/content_file.json"
echo "Or: code --diff /tmp/discovery_file.json /tmp/content_file.json"

View File

@ -1,29 +0,0 @@
#!/bin/bash
echo "Checking if Event IDs match Database entry_uuids..."
echo ""
# Get a few event IDs
EVENT_IDS=$(jq -r '.[0][0:3] | .[] | .id' events_Content.json)
# Get DB UUIDs
DB_UUIDS=$(jq -r '.[] | .entry_uuid' db_entries_sample.json)
echo "Event IDs (first 3):"
echo "$EVENT_IDS" | head -3
echo ""
echo "DB entry_uuids (first 5):"
echo "$DB_UUIDS"
echo ""
# Check for any matches
echo "Checking for matches..."
for event_id in $EVENT_IDS; do
if echo "$DB_UUIDS" | grep -q "$event_id"; then
echo "MATCH FOUND: $event_id"
fi
done
echo ""
echo "If no matches shown above, IDs don't match between events and DB"

File diff suppressed because it is too large Load Diff

View File

@ -1,37 +0,0 @@
[
{
"content_id_fk": 1,
"entry_id": 8,
"entry_uuid": "26977785-98eb-4de7-a77b-dc63daef44b3",
"extension": "mov",
"name": "Screen Recording 2025-11-10 at 2.42.22AM"
},
{
"content_id_fk": 2,
"entry_id": 9,
"entry_uuid": "e504c96b-5d2f-44e9-9300-ebfd1d7bd1c9",
"extension": "png",
"name": "Screenshot 2025-11-10 at 2.30.23AM"
},
{
"content_id_fk": 3,
"entry_id": 10,
"entry_uuid": "bbb07613-fd4d-4243-aaae-f0037c337462",
"extension": "png",
"name": "Screenshot 2025-11-10 at 7.21.30PM"
},
{
"content_id_fk": 4,
"entry_id": 11,
"entry_uuid": "4ba8c643-1a2f-4991-8ced-ae25dc37e403",
"extension": "png",
"name": "Screenshot 2025-11-10 at 7.21.51PM"
},
{
"content_id_fk": 5,
"entry_id": 12,
"entry_uuid": "d7dc2460-2317-4f40-a6b5-12288556bec5",
"extension": "png",
"name": "Screenshot 2025-11-10 at 2.31.03AM"
}
]

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +0,0 @@
{
"all_files": [],
"file_batches": [],
"phase_name": "Initial",
"total_events": 1,
"total_files": 0
}

View File

@ -1,61 +0,0 @@
#!/bin/bash
# Pick the first file from Discovery phase
DISCOVERY_FILE=$(jq '.all_files[0]' phase_Discovery.json)
FILE_ID=$(echo "$DISCOVERY_FILE" | jq -r '.id')
FILE_NAME=$(echo "$DISCOVERY_FILE" | jq -r '.name')
echo "=============================================="
echo "TRACKING FILE: $FILE_NAME"
echo "ID: $FILE_ID"
echo "=============================================="
echo ""
echo "DISCOVERY PHASE:"
echo "$DISCOVERY_FILE" | jq '{
id,
name,
sd_path,
content_identity_present: (.content_identity != null),
content_identity_uuid: .content_identity.uuid,
sidecars_count: (.sidecars | length)
}'
echo ""
echo "CONTENT PHASE (same file):"
CONTENT_FILE=$(jq --arg id "$FILE_ID" '.all_files[] | select(.id == $id)' phase_Content.json)
if [ -n "$CONTENT_FILE" ]; then
echo "$CONTENT_FILE" | jq '{
id,
name,
sd_path,
content_identity_present: (.content_identity != null),
content_identity_uuid: .content_identity.uuid,
sidecars_count: (.sidecars | length)
}'
else
echo " File not found in Content phase events"
fi
echo ""
echo "=============================================="
echo "COMPARISON:"
echo "=============================================="
DISC_HAS_CI=$(echo "$DISCOVERY_FILE" | jq '.content_identity != null')
CONT_HAS_CI=$(echo "$CONTENT_FILE" | jq '.content_identity != null')
echo "Discovery has content_identity: $DISC_HAS_CI"
echo "Content has content_identity: $CONT_HAS_CI"
echo ""
# Check full objects
echo "FULL DISCOVERY FILE:"
echo "$DISCOVERY_FILE" | jq '.'
echo ""
if [ -n "$CONTENT_FILE" ]; then
echo "FULL CONTENT FILE:"
echo "$CONTENT_FILE" | jq '.'
fi