How the knowledge is stored

Astrolabe: Data Model

How the whole project is stored: not as files and folders, but as a single content-addressed hypergraph — the Astrolabe data model. Every statement, every formal declaration, every relationship is one immutable, hash-identified entry.

The Astrolabe model

An Astrolabe store is a flat set of entries. Each entry is a triple (hash, ref, record):

  • ref — an ordered list of the hashes this entry points to.
  • record — an opaque payload, read by a domain plugin (here: a mathematical statement, or a Lean declaration).
  • hashSHA-256(ref₁ ‖ 0x00 ‖ … ‖ record), truncated. Identity is content: the same content always yields the same hash, everywhere it appears.

The length of ref is what an entry is. Two kinds are primitive:

  • an atom (|ref| = 1, a self-reference) — a leaf of content: a statement, a proof.
  • an edge (|ref| = 2) — a binary relation between two atoms.

Atoms and edges are already a traditional graph: each edge renders as a line between two nodes. That is the layer the project runs on today.

Because identity is the hash of content, a card can be re-derived, re-referenced, deduplicated, and merged without ambiguity — the property the whole loop relies on.

On disk this is just one file per node, named by its hash; documents reference them:

.astrolabe/
├─ atoms/
│ ├─ 38c99016b279.md ← one file per node
│ └─ 097d60abf481.md
├─ edges/
│ └─ 014c1e2a49fe.md ← a (lean, tex) bridge
└─ docs/
├─ 00-index.mdx
└─ 03-geodesics.mdx composes nodes by hash
Every node is a single file named by its hash; documents only reference them.

As an entry, each is hash, ref, and a record. An entry's sort is just the source of its refs — its own for an atom, the pair for an edge — while the edge's rel says how the two ends relate. A real atom (the geodesic sphere definition) and a real (lean, tex) edge whose rel is formalizes:

hash38c99016b279
ref[ 38c99016b279 ]self-reference → atom
record
sort(ref[0].source) = (tex)
sourcetex
content
titleGeodesic sphere
notes$S_\delta = \exp_p(\{v : \lVert v\rVert = \delta\})$
hash014c1e2a49fe
ref[ 264fbf8cb406, 6e6c552589c3 ]two atoms → edge
record
sort(ref[0].source, ref[1].source) = (lean, tex)
relformalizes

Two layers, one graph

The same store holds both sides of the mathematics:

  • tex atoms — the natural-language statements of do Carmo (a definition, a lemma).
  • lean atoms — the formal declarations that prove them.

They are joined by cross-source edges typed (lean, tex): a Lean node that formalizes a tex concept. An edge's sort is not its own — it is the pair of its endpoints' sources, (ref[0].source, ref[1].source), so a Lean↔tex bridge is typed (lean, tex) automatically. Such an edge is a first-class, hash-identified entry, exactly like any other.

Stable identity

So re-running the pipeline never churns the graph, identity is pinned to what a node is, not its current text:

  • a tex node hashes over {source, src, dcref} — which do Carmo statement;
  • a lean node hashes over {source, name} — which declaration.

A proof going from sorry to proven, or a line number moving, updates the record in place; the hash — and every reference to it — is unchanged.

Typed relations

Edges are not merely "depends on". Each carries a relation label: references (an explicit citation), uses (a concept the statement invokes), example-of, remark-on, corollary-of, formalizes (the cross-source bridge), and more. The graph is a typed network, not a bare dependency tree.

Derived numbering

A card's identity is its hash, not a number — but a reader still wants to see "Theorem 7.2.8". So numbers are derived, never stored. Within a project, the documents are scanned in order and each hash is assigned a number the first time it appears: chapter.section.item, where the chapter comes from the document's position and the rest from the order of entries inside it.

03-geodesics.mdx → chapter 3
a1f3c9e2Definition · first appears3.2.1
9c2e0f81Theorem · first appears3.2.2
4e7d22b0Corollary · first appears3.2.3
a1f3c9e2…appears again3.2.1(same hash, same number)
The number is assigned on first occurrence and follows the hash — never stored, always consistent within the project.

Because the number is a function of where a hash first occurs in that project, the same content-addressed card can carry different numbers in different projects, yet stays perfectly consistent within one. Reorder a section, move a card, or drop in a new one, and everything renumbers itself at once — there is nothing to keep in sync by hand, and no number to ever go stale.