rumpk/docs/PHASE-8-ELF-LOADER.md

234 lines
5.9 KiB
Markdown

# Phase 8: The Summoning (ELF Loader)
**Status:** 95% Complete (Debugging command ring communication)
**Date:** 2025-12-31
**Architect:** Markus Maiwald | Voxis Forge (AI)
---
## Objective
Implement dynamic ELF64 binary loading from the VFS, enabling the kernel to execute arbitrary programs from disk without recompilation. This unlocks the ability to run independent tools and swap the current Subject (userland process) at runtime.
---
## Implementation
### 1. ELF Parser (`core/loader/elf.nim`)
Defined ELF64 header and program header structures:
```nim
type
Elf64_Ehdr* {.packed.} = object
e_ident*: array[16, uint8]
e_type*: uint16
e_machine*: uint16
e_version*: uint32
e_entry*: uint64
e_phoff*: uint64
# ... (full ELF header)
Elf64_Phdr* {.packed.} = object
p_type*: uint32
p_flags*: uint32
p_offset*: uint64
p_vaddr*: uint64
p_filesz*: uint64
p_memsz*: uint64
# ... (program header)
const PT_LOAD* = 1
```
### 2. ELF Loader (`core/loader.nim`)
Implemented `kexec(path: string)`:
1. **Read ELF from VFS** - Uses `vfs_read_file()` to load binary into memory
2. **Verify Magic** - Checks ELF magic bytes (0x7F 'E' 'L' 'F')
3. **Validate Architecture** - Ensures binary is for RISC-V (e_machine == 243)
4. **Map PT_LOAD Segments** - Iterates program headers, maps loadable segments:
- Clears BSS (memsz > filesz)
- Copies data from file to target virtual address
5. **Transfer Control** - Calls `rumpk_enter_userland(e_entry)` to jump to entry point
### 3. Assembly Trampoline (`hal/arch/riscv64/switch.S`)
Added `rumpk_enter_userland`:
```asm
.global rumpk_enter_userland
rumpk_enter_userland:
# Disable Supervisor Interrupts
csrw sie, zero
# Jump to the Summoned Body
jr a0
```
### 4. Syscall Integration
**Command Type** (`core/ion.nim`):
```nim
CMD_SYS_EXEC = 0x400 # Swap Consciousness (ELF Loading)
```
**Kernel Handler** (`core/kernel.nim`):
```nim
of uint32(CmdType.CMD_SYS_EXEC):
kprintln("[Kernel] CMD_SYS_EXEC received!")
let path_ptr = cast[cstring](cmd.arg)
kexec($path_ptr)
```
**Userland Syscall** (`libs/membrane/libc_shim.zig`):
```zig
export fn nexus_syscall(cmd_id: u32, arg: u64) c_int {
var pkt = ion.CmdPacket{ .kind = cmd_id, .arg = arg, .id = [_]u8{0} ** 16 };
// ... compute provenance hash ...
if (!ion.sys_cmd_push(pkt)) {
return -1;
}
return 0;
}
```
### 5. Shell Command (`npl/nipbox/nipbox.nim`)
Implemented `exec` command:
```nim
proc do_exec(filename: string) =
if filename.len == 0:
print("Usage: exec <path>")
return
print("[NipBox] Summoning " & filename & "...")
let result = nexus_syscall(CMD_SYS_EXEC, cast[uint64](cstring(filename)))
if result != 0:
print("[NipBox] Syscall failed!")
else:
print("[NipBox] Syscall sent successfully")
```
### 6. Test Binary (`rootfs/src/hello.c`)
Created minimal C program to test dynamic loading:
```c
#include "libnexus.h"
int main() {
print("Hello from a dynamically loaded ELF!\n");
print("Consciousness transferred successfully.\n");
return 0;
}
```
### 7. Build Integration (`build.sh`)
Added Step 5.7 to compile `hello.c`:
```bash
zig cc -target riscv64-freestanding-none \
-ffreestanding -fno-stack-protector \
-c rootfs/src/hello.c -o build/hello.o
zig cc -T apps/linker_user.ld \
build/subject_entry.o build/hello.o build/libc_shim.o \
-L build -lnexus \
-o rootfs/bin/hello
```
---
## Current Status
### ✅ Completed
1. ELF parser with full header validation
2. Segment mapping logic (PT_LOAD, BSS handling)
3. Assembly trampoline for userland entry
4. Syscall infrastructure (CMD_SYS_EXEC)
5. Shell integration (`exec` command)
6. Test binary compilation and VFS inclusion
7. Build system automation
### 🔧 In Progress
**Command Ring Communication Issue:**
The `exec bin/hello` command is sent by NipBox, but the kernel's ION fiber never receives it. The debug output `[Kernel] CMD_SYS_EXEC received!` does not appear.
**Possible Causes:**
1. SysTable magic check failing in `sys_cmd_push()`
2. Command ring not properly wired to `chan_cmd`
3. ION fiber not polling `chan_cmd` correctly
4. Pointer invalidation (cstring temporary freed before kernel reads it)
---
## Testing
### VFS Verification
```
[VFS] Mounting TarFS InitRD... Start=0x000000008020EBC0
Found: ./bin/hello Type: 0
Mounted: bin/hello ( bytes)
```
✅ Binary is correctly embedded in initrd and indexed by VFS.
### Shell Execution
```
root@nexus:# exec bin/hello
[NexShell] Forwarding to Subject...
[NipBox] Summoning bin/hello...
[NipBox] Syscall sent successfully
root@nexus:#
```
✅ Syscall returns success (0), but kernel doesn't process it.
---
## Next Steps
1. **Debug SysTable** - Verify magic value at `0x83000000`
2. **Trace Command Ring** - Add debug output to `sys_cmd_push()` in Zig
3. **Verify ION Polling** - Confirm `chan_cmd.recv()` is being called
4. **Fix Pointer Lifetime** - Ensure path string survives until kernel reads it
5. **Test Full Flow** - Once syscall reaches kernel, verify ELF loads and executes
---
## Design Notes
### Single Address Space
Rumpk currently uses a single address space (no MMU/paging). The ELF loader trusts that:
- Kernel resides at `0x80000000+`
- Userland binaries load at `0x84000000+`
- No memory protection between kernel and userland
This is acceptable for Phase 8 (proof of concept). Future phases will add proper memory isolation.
### Zero-Copy Loading
The VFS returns a direct pointer to the file data in the initrd. The ELF loader copies segments to their target addresses without intermediate buffering.
### Provenance Tracking
Each syscall packet includes a SipHash-based provenance ID for audit logging and replay protection (currently stubbed).
---
## References
- ELF Specification: https://refspecs.linuxfoundation.org/elf/elf.pdf
- RISC-V ABI: https://github.com/riscv-non-isa/riscv-elf-psabi-doc
- Rumpk Architecture: `docs/ARCHITECTURE.md`
- ION Subsystem: `docs/ION-PROTOCOL.md`