ARCHITECTURE SPECIFICATION & FORMAL SYSTEM REPORT: k501-AIONARC
Document ID: k501-AIONARC-SPEC-2026-05-23
Time Anchor (System Clock): Unix Epoch 1779502114 | Sat May 23 02:08:34 2026 UTC / 04:08:34 CEST
System Architect: iinkognit0
Deployment State: STABLE / CANONICAL / VERIFIED
1. The Core Paradigm of k501-AIONARC - The Information Space
The k501-AIONARC - The Information Space represents a complete architectural departure from mutable, path-dependent hierarchical file systems. It establishes a deterministic, math-driven, append-only informational continuum. The system's foundational design is governed by the absolute physical decoupling of Identity (the topostructural manifest) and Substance (the underlying content payload).
Axiomatic Pillars:
- Content-Addressable Topology: Data primitives within the space possess no arbitrary human-readable names or volatile folder paths. A file or block is addressed strictly by what it inherently is (its cryptographic digest), not where it resides.
- Structural Immutability: Once an information package is committed to the space, it becomes unalterable. Any alteration down to a single bit flips the cryptographic signature of the block, triggering immediate isolation by the system's structural auditor layers.
- Global Deduplication Invariant: The space normalizes incoming streams. Identical information units collapse into a single physical entity within the object layer, regardless of their ingestion frequency, temporal origin, or logical context.
2. The Six-Phase Ingestion Pipeline Architecture
The monolithic control flow implemented in main.c orchestrates the conversion of raw, unaligned source files into the immutable state space. It processes data using four core memory-mapped object sets: K501_DocumentSet docs, K501_NormalizedSet norm, K501_State state, and K501_State final.
[ Phase 1 & 2: Ingestion & Deep Read ] ──> Recursively map directory files to RAM
│
▼
[ Phase 3: Batch Parsing ] ──> Flatten structures to normalized byte streams
│
▼
[ Phase 4: Frame Structuring ] ──> Apply 4KB chunking, extract QH256, execute CAS write
│
▼
[ Phase 5: Fixpoint Iteration ] ──> Resolve topostructural refs (max 10 cycles)
│
▼
[ Phase 6: Manifest Emission ] ──> Serialize identity matrix to output.ndjson
Technical Breakdown of Ingestion Phases:
Phase 1 & 2: Ingestion and Deep Read
The entry point evaluates command-line constraints (argc < 2). The kernel then invokes k501_ingest_directory_recursive, scanning the source target with a hardcoded maximum recursion depth of exactly 2. Every targeted payload is mapped into volatile memory inside the docs container.
Phase 3: Batch Parsing
The engine transitions to k501_parse_batch, iterating through the raw paths. The helper routine read_file executes binary reads, allocates heap segments via malloc, and flattens the contents into sequential, structured sequences inside K501_NormalizedSet out.
Phase 4: Structuring & Frame Generation
The execution context enters k501_frame_build. The engine steps through a sliding block window to slice the normalized byte array into distinct tiles. At this precise junction, the cryptographic binding occurs: as soon as a frame's identity is computed, its raw payload is instantly branched and written to the persistent storage tier.
Phase 5: Fixpoint Iteration
The topostructural configuration undergoes mathematical consolidation via k501_iterate_fixpoint. The system executes a transcedent fixpoint search algorithm to reconcile structural references across the generated frame boundary. The loop terminates deterministically when the system stabilizes, capped at a maximum threshold of 10 execution cycles.
Phase 6: Manifest Serialization
The consolidated state space is compressed through k501_write_frames_ndjson. The payload attributes are entirely stripped from the object structures. The engine isolates only the id and hash fields, emitting a highly compressed sequential index map into the file output.ndjson.
3. The QH256 Cryptographic Identity Layer
Kryptographic integrity validation and address derivation inside the k501-AIONARC space are managed by the payload-dependent hashing algorithms defined in src/qh_core.c.
Mathematical Window Splitting
Within the frame engine, raw binary files are discretized using a fixed system window slice constant:
$$\text{CHUNK_SIZE} = 4096 \text{ Bytes}$$
For any given block boundary, the exact chunk length is calculated deterministically via the following invariant equation:
$$\text{chunk_len} = \min(\text{CHUNK_SIZE}, \text{len} - \text{offset})$$
State Space Mapping
The raw bytes of each isolated tile are passed into k501_hash_compute(). This routine maps the data array into a 32-byte cryptographic vector, which is subsequently expanded into a 64-character hexadecimal string. Due to the high-dimensional entropy distribution of the hashing layer, any single-bit delta in the content payload forces a radical shift in the output vector (avalanche mechanics), eliminating block collisions and making silent content tampering mathematically impossible.
4. Content-Addressable Storage (CAS) & Directory Layout
The physical persistence layer implemented in src/cas_store.c handles long-term artifact conservation. It eliminates traditional naming schemes, relying solely on the 64-character hex-encoded QH256 hash string to construct storage paths.
Two-Tier Fan-Out Tree Structure
To bypass underlying operating system performance drops caused by directory inode saturation (holding too many files in a flat folder), the storage engine divides the hash string:
-
Prefix (Directory Node): The first 2 characters of the hex string establish the subdirectory name. This yields exactly $16^2 = 256$ possible structural directory buckets (
store/00/throughstore/ff/). - Suffix (Leaf Artifact): The remaining 62 characters of the digest serve as the physical filename on disk.
Example Digest: e6931ec796c1283467521428b407b972f380bf4b7133e4487e6de5d01fa7184f
Physical Path: store/e6/931ec796c1283467521428b407b972f380bf4b7133e4487e6de5d01fa7184f
Atomic Deduplication Mechanism
Prior to issuing an active disk write operation, k501_cas_write checks the path using the POSIX stat() system call. If the target hash exists in the tree, the write sequence aborts immediately, returning code 0 (Success). Duplicate blocks are discarded, ensuring optimal storage utilization.
5. Empirical Validation & Performance Metrics
A live pipeline validation run was conducted utilizing the raw source archive MD_2026-05-22. The execution metrics confirm the performance profile of the architecture:
System Performance Matrix
| Metric Parameter | Measured Physical Value | Structural Interpretation |
|---|---|---|
| Raw Source Input Volume | 41 MB | Unstructured Markdown documents across disk boundaries |
| Logical Manifest Frames | 10,464 Lines | Total sequenced states committed to output.ndjson
|
| Physical CAS Object Leaf Nodes | 10,359 Files | Discrete block items written to the store/ tree |
| Deduplication Delta ($\Delta$) | 105 Chunks | Redundant write streams blocked by active identity collisions |
| Hard-Index Manifest Weight | 899,258 Bytes | Compressed structural footprint of output.ndjson (~879 KB) |
| Topostructural Net Density | ~85.94 Bytes/Frame | Mean memory footprint required per active index line |
| Reconstructed Output Stream | 40,272,111 Bytes | Bit-perfect, lossless recovery of net input data |
| Structural Reduction Factor | ~46.6 : 1 | Scale ratio between the manifest layer and source space |
Analysis of Storage Metrics:
-
Manifest Efficiency: The structural manifest (
output.ndjson) represents merely 2.14% of the original input data volume while maintaining complete topostructural representation. - Slack Space Elimination: The variance between the raw folder footprint (41 MB) and the net recovered bytes (40.27 MB) demonstrates the removal of filesystem sector padding. By compressing individual streams into a single contiguous sequence, k501-AIONARC strips away storage fragmentation overhead.
6. Bidirectional Reversibility & Semantic Graph Evolution
The restoration utility src/k501_restore.c establishes the absolute, zero-loss mathematical reversibility of the transformation cycle.
Reconstruction Mechanics
The restoration tool opens output.ndjson and parses it sequentially. It isolates each 64-character hex hash, parses it back into a raw binary byte array, and hands it over to k501_cas_read. The storage controller targets the exact two-tier path within the 256-bucket fan-out layout, pulls the raw payload, and streams it into the target output file. Because the index preserves the chronological sequence of the ingestion cycle, the resulting output matches the source byte stream with bit-perfect fidelity.
Architectural Outlook
The current implementation completes the Payload-Persistence milestone, validating the core mechanics of content-addressable storage. With the stable state space confirmed, the framework is positioned for its next evolutionary phase: Semantic Graph Interlinking. Future updates will transition the space from a linear frame sequence into a non-linear topological graph. Frames will embed QH256 hashes of related nodes directly within their metadata layers, creating a self-organizing, tamper-proof, and multidimensional knowledge network.
References and contact
- Patrick R. Miller (Iinkognit0) — K501 / AIONARC Core Architecture
- ORCID: https://orcid.org/0009-0005-5125-9711
- Website: https://iinkognit0.de/
- GitHub: https://github.com/Iinkognit0
- GitHub: https://github.com/k501-Information-Space/eArc
- Publications: https://dev.to/k501is
- Mastodon: https://mastodon.social/@K501
- Email: contact.k501@proton.me
As i State Iinkognit0 Declare : THE INFORMATION SPACE
























