Projects

archiveOS

Desktop app · Personal · 2026

Replaced ad-hoc file storage with a deterministic retrieval system for long-term, high-stakes document access.

Designed and built a local-first archive system that separates storage from structure, enabling consistent ingestion, secure storage, and reliable retrieval of documents years after they were created.

Documents from multiple sources indexed into a system that enables structured retrieval workflows

> Key decisions

Documents from multiple sources indexed into a system that enables structured retrieval workflows

01/ Decouple storage from the product layer

User-selected storage providers handle sync, not structure or retrieval.

Impact

  • > Compatible with any provider syncing to disk such as iCloud or Google Drive
  • > No dependency on external APIs or vendor lock-in
  • > Ensures portability over long retention periods

02/ Separate index from file system

Files are inconsistent, so structure must be independent.

Impact

  • > SQLite-backed metadata index
  • > Files remain unchanged and accessible outside the app
  • > Structure can evolve without migrating files

03/ Introduce an ingestion inbox

Documents arrive incomplete and inconsistently formatted.

Impact

  • > All files enter a staging layer before classification
  • > Ingestion does not block on missing information

04/ Design for incomplete and uncertain data

Real-world inputs are messy and often lack key fields.

Impact

  • > Defaults and inferred metadata reduce input burden
  • > The system remains queryable even with partial information

05/ Prioritise deterministic retrieval paths

Users do not navigate hierarchies under pressure.

Impact

  • > Query-based retrieval across entity, timeframe, and document type
  • > Defined retrieval paths for recurring scenarios such as tax year, insurer, and account

06/ Add optional file-level encryption

Provider-level security is insufficient for sensitive documents.

Impact

  • > Decryption and viewing are restricted to the application
  • > Adds a second security layer while maintaining user ownership
  • > Introduces dependency on the app for access to encrypted files

> Trade-offs

01/ Structure vs ingestion friction

  • > More structure improves retrieval reliability
  • > Mitigated through staged enrichment and metadata defaults

02/ Portability vs deep integration

  • > Provider-agnostic architecture avoids lock-in
  • > Limits use of cloud-native features

03/ Encryption vs accessibility

  • > Stronger security guarantees for sensitive files
  • > Reduced usability outside the system

04/ Automation vs correctness

  • > Full automation risks incorrect classification
  • > Manual-first ingestion ensures reliability

> Situation

Product
> Personal document operating system (desktop, local-first)
Users
> Individual (myself and anyone living in Germany)
Environment
> Germany, with 10-year docuemnt retention across personal, tax, and financial records
Storage model
> User-selected folders synced via providers such as iCloud or Google Drive
Constraints
  • > Documents originate from fragmented sources such as paper, PDFs, email, and portals
  • > There is no consistent metadata or naming standard at the point of ingestion
  • > Retrieval failure carries real cost in audits, claims, and missed deadlines
  • > Long-term durability is required across 10 or more years
  • > Data ownership and portability must be preserved
  • > Sensitive documents require stronger guarantees than provider-level security

> Problem

Document storage is solved. Retrieval is not.

Files accumulate across folders and cloud providers without a system that guarantees they can be found again. Over time, naming conventions drift, context is lost, and documents become effectively invisible.

The failure mode is not missing data. It is inability to retrieve the correct document with confidence under time pressure, often years after ingestion.

> Approach

Treated archiving as a system design problem under real-world constraints, not a file organisation task.

Principles

  • > Separate structure from storage
  • > Optimise for retrieval, not organisation
  • > Accept incomplete inputs and evolve structure over time
  • > Maintain user control over files and storage

> Execution

Documents from multiple sources indexed into a system that enables structured retrieval workflows
  • > Built a Mac OS desktop application using Tauri and SQLite with a local-first architecture
  • > Implemented folder-based ingestion where the user selects a root directory, local or synced via iCloud or Google Drive
  • > Defined a document schema across type, issuer, counterparty, and relevant dates
  • > Built an ingestion pipeline from file to inbox to metadata enrichment to indexed record
  • > Added an optional file-level encryption layer with controlled decryption and viewing inside the app

> Results

  • > Retrieval shifted from multi-source search to a single, deterministic lookup
  • > File storage became provider-independent and portable
  • > Archiving became a continuous workflow instead of a periodic clean-up task
  • > Sensitive documents gained an additional application-level security layer