mirror of
https://github.com/spacedriveapp/spacedrive.git
synced 2025-12-11 20:15:30 +01:00
remove stray test snapshots directory
This commit is contained in:
parent
94c4b3d11e
commit
79226a835e
@ -1,52 +0,0 @@
|
||||
# Phase Snapshot Analysis
|
||||
|
||||
## Event File Structure (from events_Content.json)
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "01f866cf-ab4a-474b-800d-6e10f1445aae",
|
||||
"name": "TUPLE_VARIANTS_IMPLEMENTED",
|
||||
"sd_path": {
|
||||
"Content": {
|
||||
"content_id": "3e457574-0b4a-51e8-80ae-ad05f78193d7"
|
||||
}
|
||||
},
|
||||
"content_identity": "3e457574-0b4a-51e8-80ae-ad05f78193d7"
|
||||
}
|
||||
```
|
||||
|
||||
## Database Entry Structure (from db_entries_sample.json)
|
||||
|
||||
```json
|
||||
{
|
||||
"entry_id": 8,
|
||||
"entry_uuid": "26977785-98eb-4de7-a77b-dc63daef44b3",
|
||||
"name": "Screen Recording 2025-11-10 at 2.42.22 AM",
|
||||
"content_id_fk": 1
|
||||
}
|
||||
```
|
||||
|
||||
## Key Findings
|
||||
|
||||
**Event Files:**
|
||||
- `id` = entry UUID (e.g., `"01f866cf-ab4a-474b-800d-6e10f1445aae"`)
|
||||
- `sd_path` = Content path with `content_id`
|
||||
- `content_identity.uuid` matches `sd_path.Content.content_id`
|
||||
|
||||
**Database Entries:**
|
||||
- `entry_uuid` = UUID stored in entry table
|
||||
- `content_id_fk` = Foreign key to content_identities table (integer)
|
||||
|
||||
**Directory Query (expected):**
|
||||
- Should use `entry_uuid` as File `id`
|
||||
- Should use Physical `sd_path` like `/Users/jamespine/Desktop/file.txt`
|
||||
|
||||
## The Problem
|
||||
|
||||
Events and queries should have matching IDs (both use `entry_uuid`), but:
|
||||
1. Event `sd_path` = Content type
|
||||
2. Query `sd_path` = Physical type
|
||||
|
||||
This means path-based filtering can't work, but ID-based matching SHOULD work if both use the same UUID.
|
||||
|
||||
Check the actual directory listing query to confirm it uses `entry_uuid` as the File `id`.
|
||||
@ -1,76 +0,0 @@
|
||||
# FINAL ANALYSIS - Normalized Cache Issue
|
||||
|
||||
## Test Results Summary
|
||||
|
||||
**Test:** Indexed Desktop (235 files total) with Content mode
|
||||
|
||||
### Phase Event Breakdown
|
||||
|
||||
| Phase | Files Emitted | Content Identity |
|
||||
|-------|---------------|------------------|
|
||||
| Discovery | 100 | All have content_identity |
|
||||
| Content | 135 | All have content_identity |
|
||||
|
||||
### Critical Finding: ZERO OVERLAP
|
||||
|
||||
**Discovery files vs Content files:** 0 files appear in both phases
|
||||
|
||||
This means:
|
||||
- Discovery phase emits events for files 1-100
|
||||
- Content phase emits events for files 101-235 (different subset)
|
||||
- **They emit for completely different files**
|
||||
|
||||
## Why This Breaks Normalized Cache
|
||||
|
||||
### Scenario: User viewing /Downloads with 30 files
|
||||
|
||||
**Step 1: Initial load**
|
||||
- Query returns 30 files with IDs: `[A, B, C, ..., Z]`
|
||||
- Cache populated
|
||||
|
||||
**Step 2: Discovery phase events arrive**
|
||||
- Batch #1: 100 files with IDs `[file001, file002, ..., file100]`
|
||||
- Check: Do any of these IDs match `[A, B, C, ..., Z]`?
|
||||
- Answer: **Maybe 5-10 files** if they're from /Downloads
|
||||
- Result: Update 5 files, ignore 95
|
||||
|
||||
**Step 3: Content phase events arrive**
|
||||
- Batch #1: 100 files with IDs `[file101, file102, ..., file200]`
|
||||
- These are DIFFERENT files from Discovery
|
||||
- Check: Do any match `[A, B, C, ..., Z]`?
|
||||
- Answer: **Maybe 10 files** if from /Downloads
|
||||
- Result: Update 10 files
|
||||
|
||||
### The Problem
|
||||
|
||||
Files get emitted in DIFFERENT batches across phases:
|
||||
- File "document.pdf" emitted in Discovery phase
|
||||
- File "photo.jpg" emitted in Content phase
|
||||
- BUT they might be in the SAME directory
|
||||
|
||||
When Content phase events arrive, they DON'T include "document.pdf" again.
|
||||
So if Discovery event had incomplete data, it never gets fixed!
|
||||
|
||||
## Data Confirms: Events Have Complete Data
|
||||
|
||||
From snapshots:
|
||||
- Discovery: 100/100 files have content_identity
|
||||
- Content: 135/135 files have content_identity
|
||||
|
||||
**Events are NOT missing content_identity!**
|
||||
|
||||
## The Real Bug
|
||||
|
||||
The logs show "Updated 0 items" which means:
|
||||
1. Event batch arrives with 100 files
|
||||
2. Current directory has 30 files
|
||||
3. **ZERO IDs match** between event and directory
|
||||
4. Nothing gets updated
|
||||
|
||||
But you said content_identity disappears...
|
||||
|
||||
Let me check if maybe the resourceFilter is REJECTING all files, causing React Query to have an empty cache.
|
||||
|
||||
## Next Step
|
||||
|
||||
Check the resourceFilter logic - it might be rejecting ALL files, leaving an empty array.
|
||||
@ -1,45 +0,0 @@
|
||||
# Phase Snapshot Analysis - FINDINGS
|
||||
|
||||
## Confirmed: IDs DO MATCH!
|
||||
|
||||
Event file with name "3AE4AB29-5EC8-4DF9-80F8-F72AE5C38FBF":
|
||||
- Event ID: `0303b9df-3b11-49a6-beb6-0fdec577fefb`
|
||||
- DB entry_uuid: `0303b9df-3b11-49a6-beb6-0fdec577fefb`
|
||||
|
||||
**Conclusion:** Events and database entries use the SAME UUID for File.id
|
||||
|
||||
## The Real Problem
|
||||
|
||||
If IDs match, why doesn't the normalized cache work?
|
||||
|
||||
Looking at the frontend logs from earlier:
|
||||
```
|
||||
Sample existing ID: "a72dfd9e-1908-4ddd-aa44-37458e8455ae"
|
||||
Sample incoming ID: "0093bc71-a000-49f3-83f1-d1f72428665e"
|
||||
WRAPPED: Updated 0 items
|
||||
```
|
||||
|
||||
These IDs are different! This means:
|
||||
1. The directory listing query returned files with certain UUIDs
|
||||
2. The events arrived with DIFFERENT UUIDs (different files entirely)
|
||||
3. No overlap = no matches = "Updated 0 items"
|
||||
|
||||
## Root Cause Theory
|
||||
|
||||
The events contain files from ALL OVER Desktop (100+ files per batch).
|
||||
The directory listing only has ~30 files from the CURRENT directory.
|
||||
|
||||
Most batches have 0 overlap with the current directory, so:
|
||||
- Updated 0 items (correct - those files aren't in this directory)
|
||||
- But without proper filtering, the lag still happens from processing all events
|
||||
|
||||
## Solution
|
||||
|
||||
We need the `resourceFilter` to work, but it needs to handle Content paths.
|
||||
|
||||
The filter should check:
|
||||
1. Does this file's `content_identity.uuid` match ANY file in my current directory query?
|
||||
2. If yes → this file belongs here, update it
|
||||
3. If no → ignore it (it's from a different directory)
|
||||
|
||||
This is what the current resourceFilter tries to do - match by content_id.
|
||||
@ -1,157 +0,0 @@
|
||||
# Normalized Cache Flow Analysis
|
||||
|
||||
## Current Implementation Flow
|
||||
|
||||
### 1. Initial Query Load
|
||||
```typescript
|
||||
// User navigates to /Downloads
|
||||
const directoryQuery = useNormalizedCache({
|
||||
wireMethod: "query:files.directory_listing",
|
||||
input: { path: "/Downloads", ... },
|
||||
resourceType: "file",
|
||||
resourceFilter: (file) => { /* content_id matching logic */ }
|
||||
})
|
||||
```
|
||||
|
||||
**Result:** Query returns 30 files with:
|
||||
- `id` = entry.uuid (e.g., `"a72dfd9e-1908-..."`)
|
||||
- `sd_path` = Physical path
|
||||
- `content_identity.uuid` = content UUID
|
||||
|
||||
**TanStack Query cache now contains:**
|
||||
```json
|
||||
{
|
||||
"files": [
|
||||
{
|
||||
"id": "a72dfd9e-1908-...",
|
||||
"name": "document.pdf",
|
||||
"content_identity": { "uuid": "xyz123..." }
|
||||
},
|
||||
// ... 29 more files
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Indexing Starts - Events Arrive
|
||||
|
||||
**Event arrives:**
|
||||
```json
|
||||
ResourceChangedBatch {
|
||||
resource_type: "file",
|
||||
resources: [
|
||||
{
|
||||
"id": "different-uuid-not-in-downloads",
|
||||
"sd_path": { "Content": { "content_id": "abc456..." } },
|
||||
"content_identity": { "uuid": "abc456..." }
|
||||
},
|
||||
// ... 99 more files from various directories
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Event Processing (lines 194-322)
|
||||
|
||||
```typescript
|
||||
queryClient.setQueryData((oldData) => {
|
||||
// oldData = { files: [30 items from /Downloads] }
|
||||
|
||||
const resourceMap = new Map(resources.map(r => [r.id, r]));
|
||||
// Map has 100 items with IDs from the event
|
||||
|
||||
const array = [...oldData.files]; // 30 items
|
||||
|
||||
// STEP A: Try to update existing items (lines 275-296)
|
||||
for (let i = 0; i < 30; i++) {
|
||||
const item = array[i]; // File from /Downloads
|
||||
|
||||
if (resourceMap.has(item.id)) {
|
||||
// Does event batch contain THIS file's ID?
|
||||
// Usually NO - event has files from other directories
|
||||
// So this rarely executes
|
||||
}
|
||||
}
|
||||
|
||||
// Result: updateCount = 0 (no matches)
|
||||
|
||||
// STEP B: Try to append new items (lines 309-315)
|
||||
if (resourceFilter) {
|
||||
for (const resource of resources) { // 100 event files
|
||||
if (!seenIds.has(resource.id) && resourceFilter(resource)) {
|
||||
// resourceFilter checks: is this file's content_id in oldData?
|
||||
// Problem: resourceFilter accesses directoryQuery.data
|
||||
// But we're INSIDE the setQueryData callback!
|
||||
// directoryQuery.data might be stale!
|
||||
array.push(resource);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return { files: array };
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## THE BUG
|
||||
|
||||
**Line 116 in context.tsx:**
|
||||
```typescript
|
||||
const currentFiles = directoryQuery.data?.files || [];
|
||||
```
|
||||
|
||||
This creates a **closure problem**:
|
||||
1. resourceFilter is defined with `directoryQuery.data` reference
|
||||
2. When event arrives, resourceFilter runs INSIDE `setQueryData`
|
||||
3. But `directoryQuery.data` still has the OLD data at this point
|
||||
4. React Query hasn't updated the hook's `data` property yet
|
||||
5. So resourceFilter is comparing against stale data!
|
||||
|
||||
**Even worse:** The resourceFilter runs for EVERY batch (100 files × many batches).
|
||||
Each time it checks if 100 files match the stale `directoryQuery.data`.
|
||||
|
||||
---
|
||||
|
||||
## Why Content Identity Disappears
|
||||
|
||||
Looking at your logs:
|
||||
- "Updated 100 items" sometimes
|
||||
- "Updated 0 items" other times
|
||||
|
||||
When "Updated 100 items":
|
||||
- Those 100 event files happened to have IDs matching files in /Downloads
|
||||
- `mergeWithoutNulls()` runs
|
||||
- BUT: `mergeWithoutNulls()` does shallow merge!
|
||||
- Line 14: `const merged = { ...incoming }`
|
||||
- This REPLACES the entire object with incoming data
|
||||
- Only top-level null fields get preserved
|
||||
|
||||
**The Problem:**
|
||||
If incoming file has `content_identity: { uuid: "xyz", content_hash: "abc" }`
|
||||
And existing has `content_identity: { uuid: "xyz", content_hash: "abc", extra_field: "value" }`
|
||||
|
||||
The merge does:
|
||||
```javascript
|
||||
merged = { ...incoming } // Start with incoming
|
||||
// Then fix top-level nulls
|
||||
if (incoming.content_identity === null && existing.content_identity !== null) {
|
||||
merged.content_identity = existing.content_identity
|
||||
}
|
||||
```
|
||||
|
||||
But if BOTH have content_identity (not null), no preservation happens!
|
||||
The incoming object REPLACES the existing one entirely.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
1. **resourceFilter** has stale data closure issue
|
||||
2. **mergeWithoutNulls** only preserves null→value, not value→different-value
|
||||
3. **No filtering works** because paths don't match (Content vs Physical)
|
||||
4. Every batch processes all 100 files, checking stale data
|
||||
5. When IDs match by chance, merge doesn't preserve nested fields properly
|
||||
|
||||
@ -1,150 +0,0 @@
|
||||
# ROOT CAUSE IDENTIFIED - Missing Icons During Indexing
|
||||
|
||||
## The Bug
|
||||
|
||||
During indexing, video files (and other media) lose their proper icons and fall back to the generic "Document" icon. The icons are restored when thumbnails are generated, but for files without thumbnails (like videos with thumbnail generation disabled), the icons remain wrong.
|
||||
|
||||
## What We Thought It Was
|
||||
|
||||
- content_identity disappearing (normalized cache bug)
|
||||
- React not re-rendering
|
||||
- TanStack Query not updating
|
||||
- Event system not working
|
||||
|
||||
## What It Actually Is
|
||||
|
||||
**The `content_kind` field is hardcoded to "unknown" in event data**
|
||||
|
||||
## The Evidence
|
||||
|
||||
### From Browser Logs
|
||||
|
||||
All 4 videos render with complete data:
|
||||
```
|
||||
Screen Recording 2025-11-09 at 7.18.50 PM:
|
||||
has_content_identity: true
|
||||
content_identity_uuid: "06358a76-0974-50c9-a939-70b70a910a91"
|
||||
sidecars_count: 0
|
||||
```
|
||||
|
||||
### From TanStack Query Devtools
|
||||
|
||||
Final state shows all videos have `content_identity` populated with all fields present.
|
||||
|
||||
### From Event Snapshots (test_snapshots/)
|
||||
|
||||
```json
|
||||
{
|
||||
"content_identity": {
|
||||
"uuid": "...",
|
||||
"content_hash": "...",
|
||||
"kind": "unknown" // THIS IS THE PROBLEM
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## How The Icon System Works
|
||||
|
||||
**Thumb.tsx line 66-67:**
|
||||
```typescript
|
||||
const kindCapitalized = file.content_identity?.kind
|
||||
? file.content_identity.kind.charAt(0).toUpperCase() + file.content_identity.kind.slice(1)
|
||||
: "Document";
|
||||
|
||||
const icon = getIcon(kindCapitalized, true, file.extension, ...);
|
||||
```
|
||||
|
||||
**When kind = "unknown":**
|
||||
- Capitalizes to "Unknown"
|
||||
- `getIcon("Unknown", ...)` returns generic document icon
|
||||
- Videos, images, etc. all show document icon
|
||||
|
||||
**When kind = "video":**
|
||||
- Capitalizes to "Video"
|
||||
- `getIcon("Video", ...)` returns video icon
|
||||
- Correct icon shows
|
||||
|
||||
## The Root Cause Code
|
||||
|
||||
**File:** `core/src/domain/file.rs:354`
|
||||
|
||||
```rust
|
||||
file.content_identity = Some(ContentIdentity {
|
||||
uuid: ci_uuid,
|
||||
content_hash: ci.content_hash.clone(),
|
||||
integrity_hash: ci.integrity_hash.clone(),
|
||||
mime_type_id: ci.mime_type_id,
|
||||
kind: ContentKind::Unknown, // TODO: Load from content_kinds table
|
||||
total_size: ci.total_size,
|
||||
entry_count: ci.entry_count,
|
||||
first_seen_at: ci.first_seen_at,
|
||||
last_verified_at: ci.last_verified_at,
|
||||
text_content: ci.text_content.clone(),
|
||||
});
|
||||
```
|
||||
|
||||
**The TODO comment says it all:** "Load from content_kinds table"
|
||||
|
||||
## Why Thumbnails "Fix" It
|
||||
|
||||
When thumbnail events arrive, they trigger a re-render and the thumbnail image loads, hiding the icon entirely. So you don't notice the wrong icon anymore. But the underlying data is still wrong - `kind` is still "unknown".
|
||||
|
||||
## The Fix
|
||||
|
||||
In `File::from_entry_uuids()`, we need to:
|
||||
|
||||
1. Load the `content_kind` for each `content_identity`
|
||||
2. Join with the `content_kinds` table
|
||||
3. Map the `kind_id` to `ContentKind` enum
|
||||
4. Set the correct kind instead of `ContentKind::Unknown`
|
||||
|
||||
**The database already has this data** - we just need to query it:
|
||||
|
||||
```rust
|
||||
// Load content kinds
|
||||
let content_kinds = content_kind::Entity::find()
|
||||
.filter(content_kind::Column::Id.is_in(
|
||||
content_identities.iter().map(|ci| ci.kind_id)
|
||||
))
|
||||
.all(db)
|
||||
.await?;
|
||||
|
||||
let kind_by_id: HashMap<i32, ContentKind> = content_kinds
|
||||
.into_iter()
|
||||
.map(|ck| (ck.id, ContentKind::from_id(ck.id)))
|
||||
.collect();
|
||||
|
||||
// Then when building ContentIdentity:
|
||||
kind: kind_by_id.get(&ci.kind_id).copied().unwrap_or(ContentKind::Unknown),
|
||||
```
|
||||
|
||||
## Why The Normalized Cache Actually Works Perfectly
|
||||
|
||||
The entire investigation proved the normalized cache is working correctly:
|
||||
|
||||
Events are emitted with complete data (all 100% have content_identity)
|
||||
IDs match between events and queries (entry.uuid)
|
||||
Deep merge preserves existing data correctly
|
||||
Filter matches files by content_id successfully
|
||||
TanStack Query updates atomically
|
||||
React re-renders when cache changes
|
||||
Components receive updated props
|
||||
|
||||
The ONLY bug: `content_kind` is hardcoded to "unknown" in event data, causing wrong icons.
|
||||
|
||||
## Test Results Summary
|
||||
|
||||
- **phase_Discovery.json**: 100 files, all with `kind: "unknown"`
|
||||
- **phase_Content.json**: 135 files, all with `kind: "unknown"`
|
||||
- **db_entries_all.json**: Database has 235 entries with proper content_id foreign keys
|
||||
|
||||
The database HAS the correct content_kind data. The directory listing query probably loads it correctly (TODO: verify). But `File::from_entry_uuids()` doesn't load it, so events have incomplete kind information.
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Add content_kind join to `File::from_entry_uuids()`
|
||||
2. Map kind_id to ContentKind enum
|
||||
3. Test that icons appear correctly during indexing
|
||||
4. Remove all the debug logging we added
|
||||
|
||||
That's it. One small database join fixes everything.
|
||||
@ -1,104 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo "=============================================="
|
||||
echo "TEMPORAL PHASE ANALYSIS"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
# Check Discovery phase
|
||||
if [ -f "phase_Discovery.json" ]; then
|
||||
echo "DISCOVERY PHASE:"
|
||||
DISCOVERY_COUNT=$(jq '.total_files' phase_Discovery.json)
|
||||
echo " Total files: $DISCOVERY_COUNT"
|
||||
|
||||
echo " Sample file structure:"
|
||||
jq '.all_files[0] | {id, name, sd_path_type: (.sd_path | keys[0]), has_content_identity: (.content_identity != null)}' phase_Discovery.json
|
||||
echo ""
|
||||
fi
|
||||
|
||||
# Check Content phase
|
||||
if [ -f "phase_Content.json" ]; then
|
||||
echo "CONTENT PHASE:"
|
||||
CONTENT_COUNT=$(jq '.total_files' phase_Content.json)
|
||||
echo " Total files: $CONTENT_COUNT"
|
||||
|
||||
echo " Sample file structure:"
|
||||
jq '.all_files[0] | {id, name, sd_path_type: (.sd_path | keys[0]), has_content_identity: (.content_identity != null)}' phase_Content.json
|
||||
|
||||
echo ""
|
||||
echo " Content identity breakdown:"
|
||||
WITH_CONTENT=$(jq '[.all_files[] | select(.content_identity != null)] | length' phase_Content.json)
|
||||
WITHOUT_CONTENT=$(jq '[.all_files[] | select(.content_identity == null)] | length' phase_Content.json)
|
||||
echo " - With content_identity: $WITH_CONTENT"
|
||||
echo " - Without content_identity: $WITHOUT_CONTENT"
|
||||
echo ""
|
||||
fi
|
||||
|
||||
# Compare IDs between phases
|
||||
echo "=============================================="
|
||||
echo "ID CONSISTENCY CHECK"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
if [ -f "phase_Discovery.json" ] && [ -f "phase_Content.json" ]; then
|
||||
DISC_ID=$(jq -r '.all_files[0].id' phase_Discovery.json)
|
||||
CONT_ID=$(jq -r '.all_files[0].id' phase_Content.json)
|
||||
|
||||
echo "First file ID in Discovery: $DISC_ID"
|
||||
echo "First file ID in Content: $CONT_ID"
|
||||
|
||||
# Check if same file appears in both phases
|
||||
DISC_NAME=$(jq -r '.all_files[0].name' phase_Discovery.json)
|
||||
CONT_FILE=$(jq -r --arg name "$DISC_NAME" '.all_files[] | select(.name == $name) | .id' phase_Content.json | head -1)
|
||||
|
||||
if [ -n "$CONT_FILE" ]; then
|
||||
echo ""
|
||||
echo "File '$DISC_NAME' found in both phases"
|
||||
echo " Discovery ID: $DISC_ID"
|
||||
echo " Content ID: $CONT_FILE"
|
||||
if [ "$DISC_ID" == "$CONT_FILE" ]; then
|
||||
echo " IDs MATCH"
|
||||
else
|
||||
echo " IDs DIFFERENT"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=============================================="
|
||||
echo "DATABASE ENTRY COMPARISON"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
# Check if event IDs exist in database
|
||||
if [ -f "phase_Content.json" ] && [ -f "db_entries_all.json" ]; then
|
||||
EVENT_ID=$(jq -r '.all_files[0].id' phase_Content.json)
|
||||
EVENT_NAME=$(jq -r '.all_files[0].name' phase_Content.json)
|
||||
|
||||
echo "Checking if event file exists in database:"
|
||||
echo " Event file name: $EVENT_NAME"
|
||||
echo " Event file ID: $EVENT_ID"
|
||||
echo ""
|
||||
|
||||
DB_ENTRY=$(jq --arg id "$EVENT_ID" '.[] | select(.entry_uuid == $id)' db_entries_all.json)
|
||||
|
||||
if [ -n "$DB_ENTRY" ]; then
|
||||
echo " FOUND in database:"
|
||||
echo "$DB_ENTRY" | jq '{entry_id, entry_uuid, name}'
|
||||
else
|
||||
echo " NOT FOUND in database"
|
||||
echo " Searching by name instead..."
|
||||
DB_BY_NAME=$(jq --arg name "$EVENT_NAME" '.[] | select(.name == $name)' db_entries_all.json)
|
||||
if [ -n "$DB_BY_NAME" ]; then
|
||||
echo " Found by name:"
|
||||
echo "$DB_BY_NAME" | jq '{entry_id, entry_uuid, name}'
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=============================================="
|
||||
echo "FILES CREATED - Check these for full details:"
|
||||
echo "=============================================="
|
||||
ls -lh *.json *.md 2>/dev/null | awk '{print " " $9 " (" $5 ")"}'
|
||||
echo ""
|
||||
@ -1,42 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo "Checking file overlap between phases..."
|
||||
echo ""
|
||||
|
||||
# Get all IDs from each phase
|
||||
DISCOVERY_IDS=$(jq -r '.all_files[].id' phase_Discovery.json | sort)
|
||||
CONTENT_IDS=$(jq -r '.all_files[].id' phase_Content.json | sort)
|
||||
|
||||
DISCOVERY_COUNT=$(echo "$DISCOVERY_IDS" | wc -l | tr -d ' ')
|
||||
CONTENT_COUNT=$(echo "$CONTENT_IDS" | wc -l | tr -d ' ')
|
||||
|
||||
echo "Discovery phase: $DISCOVERY_COUNT files"
|
||||
echo "Content phase: $CONTENT_COUNT files"
|
||||
echo ""
|
||||
|
||||
# Find overlapping IDs
|
||||
OVERLAP=$(comm -12 <(echo "$DISCOVERY_IDS") <(echo "$CONTENT_IDS") | wc -l | tr -d ' ')
|
||||
|
||||
echo "Files appearing in BOTH phases: $OVERLAP"
|
||||
echo ""
|
||||
|
||||
if [ "$OVERLAP" -gt 0 ]; then
|
||||
echo "Some files appear in both phases"
|
||||
echo ""
|
||||
echo "Sample overlapping file:"
|
||||
OVERLAP_ID=$(comm -12 <(echo "$DISCOVERY_IDS") <(echo "$CONTENT_IDS") | head -1)
|
||||
echo "ID: $OVERLAP_ID"
|
||||
|
||||
echo ""
|
||||
echo "In Discovery:"
|
||||
jq --arg id "$OVERLAP_ID" '.all_files[] | select(.id == $id) | {name, has_content_identity: (.content_identity != null)}' phase_Discovery.json
|
||||
|
||||
echo ""
|
||||
echo "In Content:"
|
||||
jq --arg id "$OVERLAP_ID" '.all_files[] | select(.id == $id) | {name, has_content_identity: (.content_identity != null)}' phase_Content.json
|
||||
else
|
||||
echo "NO files appear in both phases"
|
||||
echo " Discovery emits one set of files"
|
||||
echo " Content emits a completely different set"
|
||||
echo " This is expected if they're different batches/directories"
|
||||
fi
|
||||
@ -1,38 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo "Checking TOTAL UNIQUE files across all phases..."
|
||||
echo ""
|
||||
|
||||
# Get all unique IDs from Discovery
|
||||
DISCOVERY_IDS=$(jq -r '.all_files[].id' phase_Discovery.json | sort -u)
|
||||
|
||||
# Get all unique IDs from Content
|
||||
CONTENT_IDS=$(jq -r '.all_files[].id' phase_Content.json | sort -u)
|
||||
|
||||
# Combine and count unique
|
||||
ALL_IDS=$(echo -e "$DISCOVERY_IDS\n$CONTENT_IDS" | sort -u)
|
||||
TOTAL_UNIQUE=$(echo "$ALL_IDS" | wc -l | tr -d ' ')
|
||||
|
||||
DISCOVERY_COUNT=$(echo "$DISCOVERY_IDS" | wc -l | tr -d ' ')
|
||||
CONTENT_COUNT=$(echo "$CONTENT_IDS" | wc -l | tr -d ' ')
|
||||
|
||||
echo "Discovery phase: $DISCOVERY_COUNT unique file IDs"
|
||||
echo "Content phase: $CONTENT_COUNT unique file IDs"
|
||||
echo "Total unique: $TOTAL_UNIQUE file IDs"
|
||||
echo ""
|
||||
echo "Database entries: 235"
|
||||
echo ""
|
||||
|
||||
if [ "$TOTAL_UNIQUE" -eq 235 ]; then
|
||||
echo "Total unique matches DB count!"
|
||||
echo " Discovery + Content = exactly 235 files"
|
||||
echo " They are DIFFERENT files (no duplicates)"
|
||||
elif [ "$TOTAL_UNIQUE" -lt 235 ]; then
|
||||
echo "Total unique ($TOTAL_UNIQUE) < 235"
|
||||
echo " Some files appear in BOTH phases"
|
||||
OVERLAP=$((DISCOVERY_COUNT + CONTENT_COUNT - TOTAL_UNIQUE))
|
||||
echo " Overlap: $OVERLAP files"
|
||||
else
|
||||
echo "️ Total unique ($TOTAL_UNIQUE) > 235"
|
||||
echo " This shouldn't happen - more files in events than DB?"
|
||||
fi
|
||||
@ -1,122 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo "=============================================="
|
||||
echo "DETAILED STRUCTURE COMPARISON"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
# Get first file from each phase
|
||||
echo "DISCOVERY PHASE - First File:"
|
||||
echo "================================"
|
||||
jq '.all_files[0]' phase_Discovery.json > /tmp/discovery_file.json
|
||||
cat /tmp/discovery_file.json | jq '.'
|
||||
echo ""
|
||||
|
||||
echo "CONTENT PHASE - First File:"
|
||||
echo "================================"
|
||||
jq '.all_files[0]' phase_Content.json > /tmp/content_file.json
|
||||
cat /tmp/content_file.json | jq '.'
|
||||
echo ""
|
||||
|
||||
echo "=============================================="
|
||||
echo "FIELD-BY-FIELD COMPARISON"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
# Compare each field
|
||||
echo "Field: id"
|
||||
DISC_ID=$(jq -r '.id' /tmp/discovery_file.json)
|
||||
CONT_ID=$(jq -r '.id' /tmp/content_file.json)
|
||||
echo " Discovery: $DISC_ID"
|
||||
echo " Content: $CONT_ID"
|
||||
echo ""
|
||||
|
||||
echo "Field: sd_path"
|
||||
echo " Discovery:"
|
||||
jq '.sd_path' /tmp/discovery_file.json
|
||||
echo " Content:"
|
||||
jq '.sd_path' /tmp/content_file.json
|
||||
echo ""
|
||||
|
||||
echo "Field: content_identity"
|
||||
echo " Discovery:"
|
||||
jq '.content_identity | if . == null then "NULL" else ("Present - uuid: " + .uuid) end' /tmp/discovery_file.json
|
||||
echo " Content:"
|
||||
jq '.content_identity | if . == null then "NULL" else ("Present - uuid: " + .uuid) end' /tmp/content_file.json
|
||||
echo ""
|
||||
|
||||
echo "Field: sidecars"
|
||||
DISC_SIDECARS=$(jq '.sidecars | length' /tmp/discovery_file.json)
|
||||
CONT_SIDECARS=$(jq '.sidecars | length' /tmp/content_file.json)
|
||||
echo " Discovery: $DISC_SIDECARS sidecars"
|
||||
echo " Content: $CONT_SIDECARS sidecars"
|
||||
echo ""
|
||||
|
||||
echo "=============================================="
|
||||
echo "SCHEMA DIFFERENCES"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
echo "All top-level fields in Discovery file:"
|
||||
jq 'keys | sort' /tmp/discovery_file.json
|
||||
echo ""
|
||||
|
||||
echo "All top-level fields in Content file:"
|
||||
jq 'keys | sort' /tmp/content_file.json
|
||||
echo ""
|
||||
|
||||
echo "Checking if field sets are identical..."
|
||||
DISC_FIELDS=$(jq -r 'keys | sort | join(",")' /tmp/discovery_file.json)
|
||||
CONT_FIELDS=$(jq -r 'keys | sort | join(",")' /tmp/content_file.json)
|
||||
|
||||
if [ "$DISC_FIELDS" == "$CONT_FIELDS" ]; then
|
||||
echo "Both phases have IDENTICAL field sets"
|
||||
else
|
||||
echo "Different fields between phases!"
|
||||
echo ""
|
||||
echo "Fields only in Discovery:"
|
||||
comm -23 <(jq -r 'keys[]' /tmp/discovery_file.json | sort) <(jq -r 'keys[]' /tmp/content_file.json | sort)
|
||||
echo ""
|
||||
echo "Fields only in Content:"
|
||||
comm -13 <(jq -r 'keys[]' /tmp/discovery_file.json | sort) <(jq -r 'keys[]' /tmp/content_file.json | sort)
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=============================================="
|
||||
echo "CONTENT_IDENTITY DEEP COMPARISON"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
echo "Discovery content_identity fields:"
|
||||
jq '.content_identity | if . != null then keys | sort else "null" end' /tmp/discovery_file.json
|
||||
echo ""
|
||||
|
||||
echo "Content content_identity fields:"
|
||||
jq '.content_identity | if . != null then keys | sort else "null" end' /tmp/content_file.json
|
||||
echo ""
|
||||
|
||||
echo "Are content_identity structures identical?"
|
||||
DISC_CI_FIELDS=$(jq -r '.content_identity | if . != null then (keys | sort | join(",")) else "null" end' /tmp/discovery_file.json)
|
||||
CONT_CI_FIELDS=$(jq -r '.content_identity | if . != null then (keys | sort | join(",")) else "null" end' /tmp/content_file.json)
|
||||
|
||||
if [ "$DISC_CI_FIELDS" == "$CONT_CI_FIELDS" ]; then
|
||||
echo "YES - identical structure"
|
||||
else
|
||||
echo "NO - different structure"
|
||||
echo "Discovery: $DISC_CI_FIELDS"
|
||||
echo "Content: $CONT_CI_FIELDS"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=============================================="
|
||||
echo "SAMPLE: Complete File Structure Comparison"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
# Show side-by-side comparison of a complete file structure
|
||||
echo "Discovery file saved to: /tmp/discovery_file.json"
|
||||
echo "Content file saved to: /tmp/content_file.json"
|
||||
echo ""
|
||||
echo "Use: diff /tmp/discovery_file.json /tmp/content_file.json"
|
||||
echo "Or: code --diff /tmp/discovery_file.json /tmp/content_file.json"
|
||||
|
||||
@ -1,29 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
echo "Checking if Event IDs match Database entry_uuids..."
|
||||
echo ""
|
||||
|
||||
# Get a few event IDs
|
||||
EVENT_IDS=$(jq -r '.[0][0:3] | .[] | .id' events_Content.json)
|
||||
|
||||
# Get DB UUIDs
|
||||
DB_UUIDS=$(jq -r '.[] | .entry_uuid' db_entries_sample.json)
|
||||
|
||||
echo "Event IDs (first 3):"
|
||||
echo "$EVENT_IDS" | head -3
|
||||
echo ""
|
||||
|
||||
echo "DB entry_uuids (first 5):"
|
||||
echo "$DB_UUIDS"
|
||||
echo ""
|
||||
|
||||
# Check for any matches
|
||||
echo "Checking for matches..."
|
||||
for event_id in $EVENT_IDS; do
|
||||
if echo "$DB_UUIDS" | grep -q "$event_id"; then
|
||||
echo "MATCH FOUND: $event_id"
|
||||
fi
|
||||
done
|
||||
|
||||
echo ""
|
||||
echo "If no matches shown above, IDs don't match between events and DB"
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,37 +0,0 @@
|
||||
[
|
||||
{
|
||||
"content_id_fk": 1,
|
||||
"entry_id": 8,
|
||||
"entry_uuid": "26977785-98eb-4de7-a77b-dc63daef44b3",
|
||||
"extension": "mov",
|
||||
"name": "Screen Recording 2025-11-10 at 2.42.22 AM"
|
||||
},
|
||||
{
|
||||
"content_id_fk": 2,
|
||||
"entry_id": 9,
|
||||
"entry_uuid": "e504c96b-5d2f-44e9-9300-ebfd1d7bd1c9",
|
||||
"extension": "png",
|
||||
"name": "Screenshot 2025-11-10 at 2.30.23 AM"
|
||||
},
|
||||
{
|
||||
"content_id_fk": 3,
|
||||
"entry_id": 10,
|
||||
"entry_uuid": "bbb07613-fd4d-4243-aaae-f0037c337462",
|
||||
"extension": "png",
|
||||
"name": "Screenshot 2025-11-10 at 7.21.30 PM"
|
||||
},
|
||||
{
|
||||
"content_id_fk": 4,
|
||||
"entry_id": 11,
|
||||
"entry_uuid": "4ba8c643-1a2f-4991-8ced-ae25dc37e403",
|
||||
"extension": "png",
|
||||
"name": "Screenshot 2025-11-10 at 7.21.51 PM"
|
||||
},
|
||||
{
|
||||
"content_id_fk": 5,
|
||||
"entry_id": 12,
|
||||
"entry_uuid": "d7dc2460-2317-4f40-a6b5-12288556bec5",
|
||||
"extension": "png",
|
||||
"name": "Screenshot 2025-11-10 at 2.31.03 AM"
|
||||
}
|
||||
]
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,7 +0,0 @@
|
||||
{
|
||||
"all_files": [],
|
||||
"file_batches": [],
|
||||
"phase_name": "Initial",
|
||||
"total_events": 1,
|
||||
"total_files": 0
|
||||
}
|
||||
@ -1,61 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Pick the first file from Discovery phase
|
||||
DISCOVERY_FILE=$(jq '.all_files[0]' phase_Discovery.json)
|
||||
FILE_ID=$(echo "$DISCOVERY_FILE" | jq -r '.id')
|
||||
FILE_NAME=$(echo "$DISCOVERY_FILE" | jq -r '.name')
|
||||
|
||||
echo "=============================================="
|
||||
echo "TRACKING FILE: $FILE_NAME"
|
||||
echo "ID: $FILE_ID"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
echo "DISCOVERY PHASE:"
|
||||
echo "$DISCOVERY_FILE" | jq '{
|
||||
id,
|
||||
name,
|
||||
sd_path,
|
||||
content_identity_present: (.content_identity != null),
|
||||
content_identity_uuid: .content_identity.uuid,
|
||||
sidecars_count: (.sidecars | length)
|
||||
}'
|
||||
echo ""
|
||||
|
||||
echo "CONTENT PHASE (same file):"
|
||||
CONTENT_FILE=$(jq --arg id "$FILE_ID" '.all_files[] | select(.id == $id)' phase_Content.json)
|
||||
|
||||
if [ -n "$CONTENT_FILE" ]; then
|
||||
echo "$CONTENT_FILE" | jq '{
|
||||
id,
|
||||
name,
|
||||
sd_path,
|
||||
content_identity_present: (.content_identity != null),
|
||||
content_identity_uuid: .content_identity.uuid,
|
||||
sidecars_count: (.sidecars | length)
|
||||
}'
|
||||
else
|
||||
echo " File not found in Content phase events"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
echo "=============================================="
|
||||
echo "COMPARISON:"
|
||||
echo "=============================================="
|
||||
|
||||
DISC_HAS_CI=$(echo "$DISCOVERY_FILE" | jq '.content_identity != null')
|
||||
CONT_HAS_CI=$(echo "$CONTENT_FILE" | jq '.content_identity != null')
|
||||
|
||||
echo "Discovery has content_identity: $DISC_HAS_CI"
|
||||
echo "Content has content_identity: $CONT_HAS_CI"
|
||||
echo ""
|
||||
|
||||
# Check full objects
|
||||
echo "FULL DISCOVERY FILE:"
|
||||
echo "$DISCOVERY_FILE" | jq '.'
|
||||
echo ""
|
||||
|
||||
if [ -n "$CONTENT_FILE" ]; then
|
||||
echo "FULL CONTENT FILE:"
|
||||
echo "$CONTENT_FILE" | jq '.'
|
||||
fi
|
||||
Loading…
x
Reference in New Issue
Block a user