Theory¶
This section explains the ideas that make ProllyTree work — not just how to call the API, but why the data structure is shaped the way it is and what invariants you can rely on.
If you've used a B-tree, a Merkle tree, or Git, each of the pieces below will already feel familiar. Prolly trees combine them in a specific way so that you get:
- O(log n) operations (from the B-tree side),
- Content-addressed, verifiable structure (from the Merkle side),
- A shape that depends only on the data, not on insertion order (the "prolly" / probabilistic balancing rule), and
- Cheap three-way merges over real key-value sets (from the Git-backed versioning layer).
Read in this order¶
- Prolly Trees — the core data structure. What a prolly tree is, why it exists, and how it compares to classical B-trees and Merkle trees.
- Probabilistic Balancing — how node boundaries are chosen by a rolling-hash predicate, and why this gives you history-independent shape with O(log n) depth.
- Merkle Properties & Proofs — how inclusion proofs are constructed, what the root hash tells you, and why two trees with the same root are the same tree.
- Versioning & Merge — how the versioned KV store layers commits, branches, and three-way merges on top of the tree, and what conflict resolvers you can plug in.
Why this matters in practice¶
| Design property | Where it shows up |
|---|---|
| Content-defined shape | Replicas converge. Two peers that independently inserted the same key/value set reach the same root hash without any coordination. |
| O(log n) operations | Point lookups, range scans, and proofs are all cheap on arbitrarily large stores. |
| Merkle addressing | Any subtree can be proven correct given only its hash. You can sync peers by exchanging differing subtrees (similar to a Merkle sync), not by replaying all writes. |
| Git-like versioning | Commits, branches, and git log-style history come for free on top of the tree. git-prolly uses real Git for the DAG. |
| KV-level three-way merge | Merges work on logical key-value changes, not on diffs of serialised bytes, so you never see spurious conflicts from reordered leaves. |
Related reading¶
- The academic and engineering roots: Dolt's blog post on prolly trees is a readable intro from a different implementation. The construction here is the same family with a few practical differences noted in Probabilistic Balancing.
- Merkle trees in general: the Wikipedia entry is the most concise reference.
- Prolly trees used as an AI-agent memory substrate: see Memoir, which builds semantic memory on top of this library.