Skip to content

Theory

This section explains the ideas that make ProllyTree work — not just how to call the API, but why the data structure is shaped the way it is and what invariants you can rely on.

If you've used a B-tree, a Merkle tree, or Git, each of the pieces below will already feel familiar. Prolly trees combine them in a specific way so that you get:

  • O(log n) operations (from the B-tree side),
  • Content-addressed, verifiable structure (from the Merkle side),
  • A shape that depends only on the data, not on insertion order (the "prolly" / probabilistic balancing rule), and
  • Cheap three-way merges over real key-value sets (from the Git-backed versioning layer).

Read in this order

  1. Prolly Trees — the core data structure. What a prolly tree is, why it exists, and how it compares to classical B-trees and Merkle trees.
  2. Probabilistic Balancing — how node boundaries are chosen by a rolling-hash predicate, and why this gives you history-independent shape with O(log n) depth.
  3. Merkle Properties & Proofs — how inclusion proofs are constructed, what the root hash tells you, and why two trees with the same root are the same tree.
  4. Versioning & Merge — how the versioned KV store layers commits, branches, and three-way merges on top of the tree, and what conflict resolvers you can plug in.

Why this matters in practice

Design property Where it shows up
Content-defined shape Replicas converge. Two peers that independently inserted the same key/value set reach the same root hash without any coordination.
O(log n) operations Point lookups, range scans, and proofs are all cheap on arbitrarily large stores.
Merkle addressing Any subtree can be proven correct given only its hash. You can sync peers by exchanging differing subtrees (similar to a Merkle sync), not by replaying all writes.
Git-like versioning Commits, branches, and git log-style history come for free on top of the tree. git-prolly uses real Git for the DAG.
KV-level three-way merge Merges work on logical key-value changes, not on diffs of serialised bytes, so you never see spurious conflicts from reordered leaves.
  • The academic and engineering roots: Dolt's blog post on prolly trees is a readable intro from a different implementation. The construction here is the same family with a few practical differences noted in Probabilistic Balancing.
  • Merkle trees in general: the Wikipedia entry is the most concise reference.
  • Prolly trees used as an AI-agent memory substrate: see Memoir, which builds semantic memory on top of this library.