ProllyTree · text search demo

Vector search,
versioned like code.

ProllyTree lets you do top-k semantic search over text — and then rewind it, branch it, diff it, and merge it the way Git lets you do those things to code. Eight slides; takes about a minute.

pip install prollytree Rust + Python Merkle under the hood Built with agents in mind
Concept 1 · 2 of 8

Words become coordinates.

An embedder turns each piece of text into a list of numbers — its vector. Texts that mean similar things land at similar coordinates. Search is then just "what's nearest?" The real index uses 384 dimensions; we're showing 2D so it's legible.

Hover any point to read its text. → press ▶ or right-arrow to continue

Concept 2 · 3 of 8

Search is "what's nearest?"

Your query becomes a point in the same space. Top-k sorts every document by distance and returns the closest ones. Try a query — watch where it lands.

Concept 3 · 4 of 8

Now version it.

Every write — to the primary tree and to the index — lands in a real Git commit. Click any commit below to time-travel back; both panes change together because they're stored together.

primary docs namespace

index by_body

Concept 4 · 5 of 8

Two trees, one commit.

What makes versioned vector search actually useful: the source text and the index are committed atomically — never out of sync, never drifting. With a separate vector DB you babysit a sync job. Here you don't.

without two systems, eventual consistency

Your DB documents Vector DB embeddings sync job drift, stale vectors, race conditions

with prollytree one transaction

ns_insert("docs", id, text) cascade enabled primary index same git commit ✓
Concept 5 · 6 of 8

Branch your knowledge.

Fork the store, try something — re-embed with a new model, re-chunk for tighter recall, index a speculative document an agent might be hallucinating. Discard if it doesn't help, merge back if it does. Just store.branch("experiment").

M seed obs:1 obs:2 obs:3 try new chunker re-embed test recall merge main experiment
main Live store. Agent reads + writes here. Search returns whatever was last committed on main.
experiment Isolated copy. Same data; different embedder, chunker, or scratch documents. Three-way merge when ready.
Why this matters · 7 of 8

What versioned vector search unlocks.

Most vector databases give you "the latest state, whatever it happens to be." ProllyTree treats your embeddings the way you'd treat your code: a typed, reviewable, recoverable artifact.

Rewind a poisoned corpus

An agent ingested a bad batch. Roll back to the last good commit; the search index travels with the data. No reindex.

A/B test embedders

Branch, swap MiniLM for an OpenAI model via CallableEmbedder, re-embed, compare recall. Discard the loser.

🔍

Audit what an agent learned

Diff two commits and see exactly which memories were added, removed, or rewritten between yesterday and today.

👥

Isolate multiple agents

One store, many namespaces. Each agent gets its own primary tree + index. One commit covers all of them.

📐

Cryptographic proofs

Inherited from the Merkle tree underneath. Every value carries an inclusion proof you can hand to a verifier.

🚫

No separate vector DB

Embeddings and source text share one transaction. No sync job, no consistency window, no drift.

8 of 8

Three lines from a working text index.

pip install prollytree from prollytree import NamespacedKvStore, MiniLmEmbedder store.text_index_open("docs", "by_body", MiniLmEmbedder())
or space to navigate
1 / 8