Skip to content

Python API

The Python package prollytree exposes the full Rust surface via PyO3. This page is a hand-written reference — for cross-linked source-level docs, see the .pyi stubs shipped with the wheel (they also power IDE autocomplete).

Install with pip install prollytree. See Installation → Python bindings for build-from-source details.

ProllyTree

The low-level tree. Stores bytes → bytes with Merkle properties.

from prollytree import ProllyTree, TreeConfig

tree = ProllyTree()                              # in-memory, default config
tree = ProllyTree(config=TreeConfig(modulus=64)) # tuned config
tree = ProllyTree(storage_type="file",
                  path="/path/to/data")          # persistent

Core operations

Method Notes
insert(key: bytes, value: bytes) Create or overwrite.
update(key: bytes, value: bytes) Requires key to exist.
delete(key: bytes) -> bool True if a key was removed.
find(key: bytes) -> bytes \| None Point lookup.
insert_batch(items: list[tuple[bytes, bytes]]) Amortised rebalancing.
root_hash() -> bytes Stable fingerprint of the KV set.
generate_proof(key: bytes) Returns a proof object.
verify(proof, key: bytes, value: bytes \| None) -> bool Validate inclusion / absence.

See Theory → Merkle Properties & Proofs for what proofs contain.

TreeConfig

from prollytree import TreeConfig

cfg = TreeConfig(base=4, modulus=64)
  • base — internal-node fanout hint.
  • modulus — target average leaf size. Larger ⇒ shallower trees, bigger leaves.

See Probabilistic Balancing for tuning guidance.

VersionedKvStore

The Git-backed versioned key-value store. Exposes commits, branches, diffs, and merges.

from prollytree import VersionedKvStore, ConflictResolution

store = VersionedKvStore("/path/to/store")

store.insert(b"user:alice", b"Alice")
store.commit("seed")

store.create_branch("feature")
store.update(b"user:alice", b"Alice Smith")
store.commit("rename")

store.checkout("main")
store.merge("feature", ConflictResolution.TakeSource)

Core operations

Method Notes
insert(key, value) / update(key, value) / delete(key) Staged on the current branch.
get(key) -> bytes \| None Read the current branch.
commit(message: str) -> str Returns the commit hash.
log() -> list[dict] Commit history for the current branch.
status() Staged changes not yet committed.
create_branch(name) / checkout(name) Branch management.
merge(branch, resolver=ConflictResolution.IgnoreAll) -> str Three-way merge; returns merge-commit hash.
try_merge(branch) -> (bool, list[MergeConflict]) Probe without applying.
diff(from_ref, to_ref) List of (key, op, old, new) tuples.
history(key) Every commit that touched key.
keys_at(ref) Keys that existed at a given commit/branch.

ConflictResolution

Strategy enum for merge().

Value Meaning
ConflictResolution.IgnoreAll Keep destination value.
ConflictResolution.TakeSource Prefer incoming value.
ConflictResolution.TakeDestination Prefer current value.

MergeConflict

Value class returned by try_merge. Fields: key, base_value, source_value, destination_value.

See Theory → Versioning & Merge for the algorithm.

NamespacedKvStore

A multi-tree counterpart of VersionedKvStore. Each namespace owns its own prolly tree (and optionally one or more text sub-indexes); all namespaces share one git history, so commit, branch, checkout, merge move every namespace atomically.

from prollytree import NamespacedKvStore

store = NamespacedKvStore("/path/to/store")     # init or open

store.ns_insert("users",    b"u:alice", b"Alice")
store.ns_insert("settings", b"theme",   b"dark")
store.commit("seed users + settings")

store.branch("experiment")                      # create + switch
store.ns_insert("settings", b"theme", b"light")
store.commit("flip theme")
store.checkout("main")                          # both namespaces snap back

Core operations

Method Notes
ns_insert(ns, key, value) / ns_get(ns, key) / ns_delete(ns, key) Per-namespace primary KV.
ns_list_keys(ns) -> list[bytes] All keys in a namespace.
list_namespaces() -> list[str] Every namespace known to the store.
delete_namespace(prefix) -> bool Drop a namespace wholesale.
get_namespace_root_hash(prefix) Per-namespace fingerprint for change detection.
commit(message: str) -> str One commit covering every dirty namespace + sub-index atomically.
branch(name) / checkout(name) Create-and-switch / switch existing branch.
merge(source_branch, ...) -> str Per-namespace 3-way merge.
current_branch (property) Current branch name (not a method).

Text indexing

Each namespace can host text sub-indexes. The primary KV tree is the source of truth; the index stores (id, vector) pairs only. See Text Search for the full design.

Method Notes
text_index_open(ns, idx, embedder, chunker=None) Create or re-open. Persists the embedder identity on first open and validates it on every reopen. chunker is "identity" (default) or "line".
text_index_insert(ns, idx, id: bytes, text: str) Embed + chunk + insert. Same id upserts.
text_index_delete(ns, idx, id: bytes) -> bool Prefix-scans + removes every chunk for the doc.
text_index_search(ns, idx, query: str, k: int) -> list[tuple[bytes, float]] Top-k documents (deduped across chunks) ordered by ascending distance.
text_index_len(ns, idx) / text_index_chunk_count(ns, idx) Distinct documents vs raw chunks.
text_index_drop(ns, idx) -> bool Drop in-memory cache + Python-side embedder/chunker registration.

Cascade

ns_insert and ns_delete can auto-mirror into registered text indexes — no dual-write needed.

Method Notes
set_cascade(ns, [idx_name, ...]) Opt in. Runtime-only (not persisted).
clear_cascade(ns) Opt out.
cascade_for_namespace(ns) -> list[str] \| None Inspect current cascade list.

Drift management

Method Notes
audit_text_index(ns, idx) -> dict {"orphans_in_index", "missing_from_index", "is_in_sync"}.
purge_text_index_orphans(ns, idx) -> int Remove index entries that have no primary row.

Externalisation + blob GC

Method Notes
set_externalize_threshold(bytes: int \| None) Values larger than bytes are stored as content-addressed blobs (only a 44-byte envelope inline). None disables.
externalize_threshold() -> int \| None Current threshold.
gc_blobs() -> dict {"total", "referenced", "removed", "errors"}. File / RocksDB backends only.

Embedders

Three embedder classes are exposed. All three plug into text_index_open(...) identically.

HashEmbedder

Deterministic SHA-256-based, no extra deps. Not semantic — useful for tests and exact-match lookup.

from prollytree import HashEmbedder
emb = HashEmbedder(dim=384, seed=0)
emb.id          # 'prollytree:hash-embedder/v1'
emb.dim         # 384
emb.embed("text")

MiniLmEmbedder

Bundled Candle + sentence-transformers/all-MiniLM-L6-v2 (384-d). Real semantic search. First call downloads ~90 MB of weights into $PROLLYTREE_EMBEDDER_CACHE. Requires a wheel built with the proximity_text feature (default on PyPI).

from prollytree import MiniLmEmbedder
emb = MiniLmEmbedder()                                       # defaults
emb = MiniLmEmbedder(model_id="...", revision="main")        # override either field

CallableEmbedder

Wrap any Python embedding function — OpenAI, Cohere, sentence-transformers, your own pipeline.

from prollytree import CallableEmbedder
emb = CallableEmbedder(
    id="openai:text-embedding-3-small",       # persisted with the index
    version="2024-01",                        # change when distribution changes
    dim=1536,
    embed_fn=lambda text: ...,                # returns list[float] of length `dim`
)

The wrapped callable runs under the GIL. Dim mismatches surface as a clear ValueError.

Feature-availability flags

The package exposes booleans that mirror the wheel's compiled features. Useful for fallback in libraries that want to remain importable on slim wheels:

import prollytree as p
p.sql_available              # ProllySQLStore present
p.git_available              # WorktreeManager / WorktreeVersionedKvStore present
p.namespaced_available       # NamespacedKvStore present
p.proximity_available        # HashEmbedder / CallableEmbedder + text-index methods present
p.proximity_text_available   # MiniLmEmbedder present

ProllySQLStore

GlueSQL adapter — treat the store as relational tables.

from prollytree import ProllySQLStore

sql = ProllySQLStore("/path/to/store")
sql.execute("CREATE TABLE users (id INTEGER, name TEXT)")
sql.execute("INSERT INTO users VALUES (1, 'Alice')")
rows = sql.execute("SELECT * FROM users WHERE id = 1")

.execute(query: str, params: tuple | None = None) returns a list of rows for SELECT and an affected-row count for DML. See the SQL Interface for supported SQL features.

Exceptions

The bindings raise a ProllyTreeError hierarchy:

  • ProllyTreeError — base class.
  • StorageError — I/O / backend problems.
  • MergeError — merge failures (when you refuse to plug in a resolver).
  • SqlError — GlueSQL failures.

Catch selectively where it matters:

from prollytree import ProllyTreeError, StorageError

try:
    store = VersionedKvStore("/some/path")
    store.insert(b"k", b"v")
except StorageError as e:
    print("storage failed:", e)
except ProllyTreeError as e:
    print("generic tree failure:", e)

Pointers