rfc: Add L4 Feed architecture spec (DuckDB + LanceDB)
This commit is contained in:
parent
282eecab24
commit
875c9b7957
|
|
@ -0,0 +1,202 @@
|
||||||
|
# RFC-0130: L4 Feed — Temporal Event Store
|
||||||
|
|
||||||
|
**Status:** Draft
|
||||||
|
**Author:** Frankie (Silicon Architect)
|
||||||
|
**Date:** 2026-02-03
|
||||||
|
**Target:** Janus SDK v0.2.0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
L4 Feed ist das temporale Event-Storage-Layer für Libertaria. Es speichert soziale Primitive (Posts, Reactions, Follows) mit hybridem Ansatz:
|
||||||
|
|
||||||
|
- **DuckDB:** Strukturierte Queries (Zeitreihen, Aggregations)
|
||||||
|
- **LanceDB:** Vektor-Search für semantische Ähnlichkeit
|
||||||
|
|
||||||
|
## Kenya Compliance
|
||||||
|
|
||||||
|
| Constraint | Status | Implementation |
|
||||||
|
|------------|--------|----------------|
|
||||||
|
| RAM <10MB | ✅ Planned | DuckDB in-memory mode, LanceDB mmap |
|
||||||
|
| No cloud | ✅ | Embedded storage only |
|
||||||
|
| <1MB binary | ⚠️ TBD | Stripped DuckDB + custom LanceDB bindings |
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ L4 Feed Layer │
|
||||||
|
├─────────────────────────────────────────────────────────────┤
|
||||||
|
│ ┌──────────────┐ ┌──────────────┐ │
|
||||||
|
│ │ DuckDB │ │ LanceDB │ │
|
||||||
|
│ │ (events) │ │ (embeddings) │ │
|
||||||
|
│ ├──────────────┤ ├──────────────┤ │
|
||||||
|
│ │ - Timeline │ │ - ANN search │ │
|
||||||
|
│ │ - Counts │ │ - Similarity │ │
|
||||||
|
│ │ - Replies │ │ - Clustering │ │
|
||||||
|
│ └──────────────┘ └──────────────┘ │
|
||||||
|
│ │ │ │
|
||||||
|
│ └───────────┬───────────┘ │
|
||||||
|
│ │ │
|
||||||
|
│ ┌───────▼───────┐ │
|
||||||
|
│ │ FeedStore │ │
|
||||||
|
│ └───────────────┘ │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Data Model
|
||||||
|
|
||||||
|
### Event Types
|
||||||
|
|
||||||
|
```zig
|
||||||
|
pub const EventType = enum {
|
||||||
|
post, // Original content
|
||||||
|
reaction, // like, boost, bookmark
|
||||||
|
follow, // Social graph edge (directed)
|
||||||
|
mention, // @username reference
|
||||||
|
hashtag, // #topic tag
|
||||||
|
edit, // Content modification
|
||||||
|
delete, // Tombstone (soft delete)
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### FeedEvent Structure
|
||||||
|
|
||||||
|
| Field | Type | Description |
|
||||||
|
|-------|------|-------------|
|
||||||
|
| id | u64 | Snowflake ID (time-sortable, 64-bit) |
|
||||||
|
| event_type | EventType | Enum discriminator |
|
||||||
|
| author | [32]u8 | DID (Decentralized Identifier) |
|
||||||
|
| timestamp | i64 | Unix nanoseconds |
|
||||||
|
| content_hash | [32]u8 | Blake3 hash of canonical content |
|
||||||
|
| parent_id | ?u64 | For replies/threading |
|
||||||
|
| embedding | ?[384]f32 | 384-dim vector (LanceDB) |
|
||||||
|
| tags | []string | Hashtags |
|
||||||
|
| mentions | [][32]u8 | Referenced DIDs |
|
||||||
|
|
||||||
|
## DuckDB Schema
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Events table (structured data)
|
||||||
|
CREATE TABLE events (
|
||||||
|
id UBIGINT PRIMARY KEY,
|
||||||
|
event_type TINYINT,
|
||||||
|
author BLOB(32),
|
||||||
|
timestamp BIGINT,
|
||||||
|
content_hash BLOB(32),
|
||||||
|
parent_id UBIGINT,
|
||||||
|
tags VARCHAR[],
|
||||||
|
embedding_ref INTEGER -- Index into LanceDB
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Indexes for common queries
|
||||||
|
CREATE INDEX idx_author_time ON events(author, timestamp DESC);
|
||||||
|
CREATE INDEX idx_parent ON events(parent_id);
|
||||||
|
CREATE INDEX idx_time ON events(timestamp DESC);
|
||||||
|
|
||||||
|
-- FTS for content search (optional)
|
||||||
|
CREATE TABLE event_content (
|
||||||
|
id UBIGINT PRIMARY KEY REFERENCES events(id),
|
||||||
|
text_content VARCHAR
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
## LanceDB Schema
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Python pseudocode for schema
|
||||||
|
import lancedb
|
||||||
|
from lancedb.pydantic import LanceModel, Vector
|
||||||
|
|
||||||
|
class Embedding(LanceModel):
|
||||||
|
id: int # Matches events.id
|
||||||
|
vector: Vector(384) # 384-dim embedding
|
||||||
|
|
||||||
|
# Metadata for filtering
|
||||||
|
event_type: int
|
||||||
|
author: bytes # 32 bytes DID
|
||||||
|
timestamp: int
|
||||||
|
```
|
||||||
|
|
||||||
|
## Query Patterns
|
||||||
|
|
||||||
|
### 1. Timeline (Home Feed)
|
||||||
|
```sql
|
||||||
|
SELECT * FROM events
|
||||||
|
WHERE author IN (SELECT following FROM follows WHERE follower = ?)
|
||||||
|
ORDER BY timestamp DESC
|
||||||
|
LIMIT 50;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Thread (Conversation)
|
||||||
|
```sql
|
||||||
|
WITH RECURSIVE thread AS (
|
||||||
|
SELECT * FROM events WHERE id = ?
|
||||||
|
UNION ALL
|
||||||
|
SELECT e.* FROM events e
|
||||||
|
JOIN thread t ON e.parent_id = t.id
|
||||||
|
)
|
||||||
|
SELECT * FROM thread ORDER BY timestamp;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Semantic Search (LanceDB)
|
||||||
|
```python
|
||||||
|
# Find similar posts
|
||||||
|
table.search(query_embedding) \
|
||||||
|
.where("event_type = 0") \ # Only posts
|
||||||
|
.limit(20) \
|
||||||
|
.to_pandas()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Synchronization Strategy
|
||||||
|
|
||||||
|
1. **Write Path:**
|
||||||
|
- Insert into DuckDB (ACID transaction)
|
||||||
|
- Generate embedding (local model, ONNX Runtime)
|
||||||
|
- Insert into LanceDB (async, eventual consistency)
|
||||||
|
|
||||||
|
2. **Read Path:**
|
||||||
|
- DuckDB: Structured queries, counts, timelines
|
||||||
|
- LanceDB: Vector similarity, clustering
|
||||||
|
- Hybrid: Vector + time filter (LanceDB filter API)
|
||||||
|
|
||||||
|
## Implementation Phases
|
||||||
|
|
||||||
|
### Phase 1: DuckDB Core (Sprint 4)
|
||||||
|
- [ ] DuckDB Zig bindings (C API wrapper)
|
||||||
|
- [ ] Event storage/retrieval
|
||||||
|
- [ ] Timeline queries
|
||||||
|
- [ ] Thread reconstruction
|
||||||
|
|
||||||
|
### Phase 2: LanceDB Integration (Sprint 5)
|
||||||
|
- [ ] LanceDB Rust bindings (via C FFI)
|
||||||
|
- [ ] Embedding storage
|
||||||
|
- [ ] ANN search
|
||||||
|
- [ ] Hybrid queries
|
||||||
|
|
||||||
|
### Phase 3: Optimization (Sprint 6)
|
||||||
|
- [ ] WAL for durability
|
||||||
|
- [ ] Compression (zstd for content)
|
||||||
|
- [ ] Incremental backups
|
||||||
|
- [ ] RAM usage optimization
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
| Library | Version | Purpose | Size |
|
||||||
|
|---------|---------|---------|------|
|
||||||
|
| DuckDB | 0.9.2 | Structured storage | ~15MB → 5MB stripped |
|
||||||
|
| LanceDB | 0.9.x | Vector storage | ~20MB → 8MB stripped |
|
||||||
|
| ONNX Runtime | 1.16 | Embeddings | Optional, ~50MB |
|
||||||
|
|
||||||
|
**Total binary impact:** ~13MB (DuckDB + LanceDB stripped, ohne ONNX)
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
1. **Embedding Model:** All-MiniLM-L6-v2 (22MB) oder kleiner?
|
||||||
|
2. **Sync Strategy:** LanceDB als optionaler Index (graceful degradation)?
|
||||||
|
3. **Replication:** Event sourcing für Node-to-Node sync?
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Sovereign; Kinetic; Anti-Fragile.* ⚡️
|
||||||
Loading…
Reference in New Issue