nip/docs/formats_and_concepts.md

163 lines
8.7 KiB
Markdown

# NIP: Formats and Core Concepts
**Version:** 1.2
**Date:** 2025-07-16
This document specifies the core data formats, storage architecture, and fundamental concepts for the Nexus Installation Program (`nip`). It merges the initial project plan with a content-addressed, Merkle-based architecture for maximum efficiency, verifiability, and reproducibility.
---
## 1. Core Principles
- **Content-Addressable:** All data is stored based on its content hash, providing automatic deduplication.
- **Cryptographically Verifiable:** The entire system state can be verified with a single cryptographic hash.
- **Immutable & Atomic:** Installations and updates are atomic operations, ensuring system consistency.
- **Declarative:** The system state is defined by declarative manifest files.
### 1.1. Trust and Authenticity
To ensure not just integrity but also authenticity, `nip` incorporates a trust layer based on Ed25519 signatures.
- **Manifest Signatures:** Each `.npk` manifest can be signed using Ed25519 keys (e.g., OpenSSH keys). Signatures can be detached or inline within the manifest. This allows verification of the package's origin. Multiple signatures (e.g., personal, CI, Foundation) are supported.
- **Root-of-Trust for `nip.lock`:** The `nip.lock` file, representing a complete system generation, can be signed. A single signature over the lockfile transforms the Merkle root into a tamper-evident release artifact.
- **Key Management:** Support for `keyid`, `created`, and `expires` metadata for keys facilitates revocation and rotation without requiring every package to be rebuilt. This lays the groundwork for future TUF-style metadata integration.
## 2. Hashing Algorithms
- **Cryptographic Hashing:** The default hash algorithm is **BLAKE2b-512** until BLAKE3 becomes available in Nimble. The digest is encoded as **Multihash** (varint `<code><len><digest>`) to ensure future-proofing, allowing for easy transitions to other algorithms like BLAKE3, SHA-512 or KangarooTwelve without redesigning the CAS.
- **Non-Cryptographic Hashing:** **SipHash** is recommended for internal data structures.
## 3. Storage Architecture
### 3.1. The Content-Addressable Store (CAS)
The CAS is the canonical source of all file data.
- **Locations:** `~/.nip/cas/` (user) and `/var/lib/nip/cas/` (system).
- **Compression:** To conserve disk space, objects are stored compressed by default using `zstd`. However, the canonical hash of an object is **always the hash of its uncompressed content** using the configured algorithm (BLAKE2b-512 by default). Integrity is always verified against the true data. This behavior can be configured in `nip.conf` (e.g., `cas.compress = true`, `cas.compression_level = 19`).
- **Structure:** Objects are stored by their multihash (hex-encoded), sharded by the first two hex characters (e.g., `cas/ab/cdef1234...`). For large fleets, sharding can extend to more levels (e.g., `cas/ab/cd/efgh...` for 4-level fan-out after 16k objects).
- **Garbage Collection:** A **reference-counted garbage collector** (`nip gc`) reclaims space by scanning every reachable manifest hash in all live `nip.lock` files (system + user cells) and marking CAS objects reachable via those manifests. Unmarked blobs are then deleted. Optionally, **"pin sets"** (named live roots, à la Docker) can be added to prevent specific objects from being collected.
### 3.2. The Manifest Store
- **Locations:** `~/.nip/manifests/` (user) and `/var/lib/nip/manifests/` (system).
- **Structure:** Stores `.npk` manifest files, whose own BLAKE3 hashes serve as their unique IDs.
## 4. The `.npk` Manifest Format
A single, self-contained KDL document. Its BLAKE3 hash is the package's unique identifier.
### 4.1. KDL Schema for `.npk`
```kdl
package "htop" {
version "3.3.0"
description "Interactive process viewer"
channels { stable, testing } # Lets one manifest live in multiple Streams without duplication.
source "pacman" { /* ... */ }
dependencies { /* ... */ }
build {
system "x86_64-linux"
compiler "nim-2.2.4"
env_hash "blake3-d34db33f..." # Stores the deterministic build fingerprint—needed for exact rebuilds & `nip verify --rebuild`.
}
snapshots {
created "2025-07-16T20:00:00Z" # Easy human audit; ISO 8601 timestamp.
}
files {
file "/Programs/Htop/3.3.0/bin/htop" "blake3-f4e5d6..." "755"
file "/Programs/Htop/3.3.0/share/man/man1/htop.1.gz" "blake3-a9b8c7..." "644"
}
artifacts { /* ... */ }
services {
systemd "htop.service" "blake3-unit..." # For packages that ship systemd units.
}
signatures {
# Ed25519 signatures on each .npk manifest (detached, or inline `signature "ed25519" "<base64>"`).
# Supports multiple keys (personal, CI, Foundation).
# Record `keyid`, `created`, `expires`.
}
}
```
## 5. The System Lockfile: `nip.lock`
The **System Generation Manifest**, defining the complete state of installed packages.
### 5.1. KDL Schema for `nip.lock`
```kdl
lockfile_version 1.2
generation {
id "blake3-d34db33f..." # The hash of this file.
created "2025-07-16T20:05:17Z" # ISO 8601 timestamp.
previous "blake3-abcdef..." # Hash of the previous generation's lockfile, forming a hash-chained log.
}
packages {
package "htop-3.3.0.npk" "blake3-htophash..."
package "ncurses-6.4.npk" "blake3-ncurseshash..."
}
signature "ed25519" "<base64>" # Root-of-trust for `nip.lock` (system generation). `nip sign lock --key ~/.ssh/nip_ed25519`
```
## 6. The Installation Filesystem
### 6.1. GoboLinux-style Hierarchy (`/Programs`)
A human-readable hierarchy of symlinks pointing to the CAS, providing a view of an immutable backend.
### 6.2. `PATH` Management via Active Index
To expose executables to the user's shell, `nip` uses an "Active Index" directory. This is a single, stable location the user adds to their `PATH`.
- **System-wide:** `/System/Index/bin`
- **User-specific:** `~/.nip/profile/bin`
When a new generation is activated via `nip switch`, `nip` atomically repopulates this directory with symlinks to the executables of the new generation. This provides fast shell startup and race-free activation.
## 7. Cross-Platform Compatibility & Security
### 7.1. Path Separators
- **Manifests must use POSIX forward slashes (`/`) for all paths.** This is the canonical format.
- The `nip` client is responsible for translating paths to the native format (e.g., `\` on Windows) at runtime.
- Manifests containing backslashes will be rejected by `nip verify`.
### 7.2. Symlink Security & Hardening
- Only **relative symlinks** are created and verified before writing to prevent filesystem escapes.
- Manifests attempting path traversals (e.g., `../../etc/passwd`) are rejected during verification.
- Optionally, `/Programs` can be mounted as `noexec,nodev` and rely on a "programs overlay" bind mount that flips execute bits only for whitelisted directories, enhancing security.
- On older Windows versions, `nip` will fall back to using junctions or hard-links if developer-mode symlinks are unavailable.
## 8. Remote Operations & Caching
`nip` supports fetching missing objects from remote binary caches (e.g., a static HTTP server or S3 bucket). Since objects are content-addressed, a remote cache is a simple key-value store, mirroring Nix's binary cache feature.
- **`nip remote add <name> <url>`:** Adds a remote cache (e.g., `nip remote add origin https://cache.nexushub.io`). Missing objects/manifests can then be fetched via HTTP range GET.
- **`nip remote push <remote> <package.npk>`:** Uploads missing CAS blobs + manifest to the remote cache, returning content URIs.
- **`nip remote serve --path /var/lib/nip`:** Starts a read-only cache server, trivial for air-gapped labs.
## 9. Advanced CAS Concepts: Delta & Chunk-level Deduplication (Phase 2)
To further optimize bandwidth and storage, `nip` is designed to support a future phase of delta and chunk-level deduplication.
- **Fixed-size "Merkle-chunk" layer:** Large binaries often compress poorly across versions, but chunk hashes can deduplicate a significant portion (e.g., ~90% of same-version-family kernels).
- **Implementation:** This would involve a `files` node in the `.npk` manifest referencing a list of chunk hashes instead of a single file hash, allowing for efficient storage and transfer of only changed chunks.
## 10. Proposed CLI Tooling
- **`nip cat <hash>`:** Dumps a CAS blob to stdout. Great for debugging. Use `--raw` flag for uncompressed stream.
- **`nip fsck`:** Verifies that every symlink in `/Programs` targets a valid CAS object referenced in *some* manifest; repairs stray links.
- **`nip doctor`:** Runs `fsck`, `gc --dry-run`, and prints actionable suggestions for system health.
- **`nip diff <genA> <genB>`:** Compares two lockfiles; outputs added/removed/changed manifests (with semantic version bump hints).