How the knowledge is stored
Astrolabe: Data Model
How the whole project is stored: not as files and folders, but as a single content-addressed hypergraph — the Astrolabe data model. Every statement, every formal declaration, every relationship is one immutable, hash-identified entry.
The Astrolabe model
An Astrolabe store is a flat set of entries. Each entry is a triple
(hash, ref, record):
ref— an ordered list of the hashes this entry points to.record— an opaque payload, read by a domain plugin (here: a mathematical statement, or a Lean declaration).hash—SHA-256(ref₁ ‖ 0x00 ‖ … ‖ record), truncated. Identity is content: the same content always yields the same hash, everywhere it appears.
The length of ref is what an entry is. Two kinds are primitive:
- an atom (
|ref| = 1, a self-reference) — a leaf of content: a statement, a proof. - an edge (
|ref| = 2) — a binary relation between two atoms.
Atoms and edges are already a traditional graph: each edge renders as a line between two nodes. That is the layer the project runs on today.
Because identity is the hash of content, a card can be re-derived, re-referenced, deduplicated, and merged without ambiguity — the property the whole loop relies on.
On disk this is just one file per node, named by its hash; documents reference them:
.astrolabe/├─ atoms/│ ├─ 38c99016b279.md ← one file per node│ └─ 097d60abf481.md├─ edges/│ └─ 014c1e2a49fe.md ← a (lean, tex) bridge└─ docs/├─ 00-index.mdx└─ 03-geodesics.mdx composes nodes by hash
As an entry, each is hash, ref, and a record. An entry's sort is just the source
of its refs — its own for an atom, the pair for an edge — while the edge's rel says how
the two ends relate. A real atom (the geodesic sphere definition) and a real (lean, tex)
edge whose rel is formalizes:
Two layers, one graph
The same store holds both sides of the mathematics:
- tex atoms — the natural-language statements of do Carmo (a definition, a lemma).
- lean atoms — the formal declarations that prove them.
They are joined by cross-source edges typed (lean, tex): a Lean node that
formalizes a tex concept. An edge's sort is not its own — it is the pair of its
endpoints' sources, (ref[0].source, ref[1].source), so a Lean↔tex bridge is typed
(lean, tex) automatically. Such an edge is a first-class, hash-identified entry,
exactly like any other.
Stable identity
So re-running the pipeline never churns the graph, identity is pinned to what a node is, not its current text:
- a tex node hashes over
{source, src, dcref}— which do Carmo statement; - a lean node hashes over
{source, name}— which declaration.
A proof going from sorry to proven, or a line number moving, updates the
record in place; the hash — and every reference to it — is unchanged.
Typed relations
Edges are not merely "depends on". Each carries a relation label: references (an
explicit citation), uses (a concept the statement invokes), example-of,
remark-on, corollary-of, formalizes (the cross-source bridge), and more. The
graph is a typed network, not a bare dependency tree.
Derived numbering
A card's identity is its hash, not a number — but a reader still wants to see
"Theorem 7.2.8". So numbers are derived, never stored. Within a project, the
documents are scanned in order and each hash is assigned a number the first time it
appears: chapter.section.item, where the chapter comes from the document's
position and the rest from the order of entries inside it.
Because the number is a function of where a hash first occurs in that project, the same content-addressed card can carry different numbers in different projects, yet stays perfectly consistent within one. Reorder a section, move a card, or drop in a new one, and everything renumbers itself at once — there is nothing to keep in sync by hand, and no number to ever go stale.