SQL Interface¶
The git-prolly sql command enables SQL query capabilities on top of ProllyTree's versioned key-value store. This integration combines the power of relational databases with Git-like version control, letting you query your data using standard SQL while maintaining full version history.
For the conceptual model see Architecture → SQL layer. For Python usage see the SQL example.
Getting Started¶
Prerequisites¶
-
Ensure you have initialized a git-prolly repository:
-
Verify the SQL feature is enabled (it's enabled by default):
Quick Start¶
# Create a table
git-prolly sql "CREATE TABLE users (id INTEGER, name TEXT, email TEXT)"
# Insert data
git-prolly sql "INSERT INTO users VALUES (1, 'Alice', 'alice@example.com')"
# Query data
git-prolly sql "SELECT * FROM users"
# Commit your changes
git-prolly commit -m "Added users table with initial data"
Command Options¶
git-prolly sql [OPTIONS] [QUERY]
Arguments:
[QUERY] SQL query to execute
Options:
-f, --file <FILE> Execute query from file
-o, --format <FORMAT> Output format (table, json, csv)
-i, --interactive Start interactive SQL shell
--verbose Show detailed error messages
-b, --branch <BRANCH> Execute against specific branch or commit (SELECT queries only, requires clean status)
-h, --help Print help
Basic Usage¶
1. Direct Query Execution¶
Execute a single SQL query directly from the command line:
2. File-based Execution¶
Execute SQL commands from a file:
# Create a SQL file
cat > schema.sql << EOF
CREATE TABLE products (
id INTEGER,
name TEXT,
price INTEGER,
category TEXT
);
INSERT INTO products VALUES
(1, 'Laptop', 1200, 'Electronics'),
(2, 'Book', 25, 'Education');
EOF
# Execute the file
git-prolly sql -f schema.sql
3. Interactive Mode¶
Start an interactive SQL shell:
In interactive mode:
- Type SQL queries and press Enter to execute
- Type help for available commands
- Type exit or quit to leave the shell
4. Historical Data Querying¶
Query data from specific branches or commits using the -b parameter:
# Query data from main branch
git-prolly sql -b main "SELECT * FROM users"
# Query data from a specific commit
git-prolly sql -b a1b2c3d4 "SELECT COUNT(*) FROM products"
# Query data from a feature branch
git-prolly sql -b feature/new-schema "SELECT * FROM categories"
Important Requirements:
- Only SELECT statements are allowed when using -b parameter
- Your working directory must have clean status (no uncommitted staging changes)
- The branch/commit will be temporarily checked out and restored after execution
Example with staging changes:
# This will be blocked if you have uncommitted changes
git-prolly set user:123 "John Doe" # Creates staging changes
git-prolly sql -b main "SELECT * FROM users"
# Error: Cannot use -b/--branch parameter with uncommitted staging changes
# Commit your changes first
git-prolly commit -m "Add new user"
git-prolly sql -b main "SELECT * FROM users" # Now works
SQL Operations¶
Supported SQL Features¶
DDL (Data Definition Language)¶
CREATE TABLE- Create new tablesDROP TABLE- Remove tables (coming soon)
DML (Data Manipulation Language)¶
INSERT- Add new rowsSELECT- Query dataUPDATE- Modify existing rowsDELETE- Remove rows
Query Features¶
WHEREclausesORDER BYsortingJOINoperations (INNER, LEFT, RIGHT)GROUP BYaggregation- Aggregate functions:
COUNT(),AVG(),MAX(),MIN() LIMITfor result limiting
Examples¶
Creating Tables¶
-- Simple table
CREATE TABLE users (
id INTEGER,
name TEXT,
email TEXT,
created_at TEXT
);
-- Table with more data types
CREATE TABLE products (
id INTEGER,
name TEXT,
price INTEGER,
in_stock BOOLEAN,
description TEXT
);
Inserting Data¶
-- Single row insert
INSERT INTO users (id, name, email)
VALUES (1, 'Alice Johnson', 'alice@example.com');
-- Multiple row insert
INSERT INTO users (id, name, email) VALUES
(2, 'Bob Smith', 'bob@example.com'),
(3, 'Charlie Brown', 'charlie@example.com');
-- Insert without specifying columns (must match table structure)
INSERT INTO products VALUES
(1, 'Laptop', 1200, true, 'High-performance laptop');
Querying Data¶
-- Select all columns
SELECT * FROM users;
-- Select specific columns
SELECT name, email FROM users;
-- With WHERE clause
SELECT * FROM users WHERE id > 1;
SELECT * FROM products WHERE price < 100 AND in_stock = true;
-- With ORDER BY
SELECT * FROM users ORDER BY name ASC;
SELECT * FROM products ORDER BY price DESC;
-- With LIMIT
SELECT * FROM products ORDER BY price DESC LIMIT 5;
Joins¶
-- Create related tables
CREATE TABLE orders (
id INTEGER,
user_id INTEGER,
product_id INTEGER,
quantity INTEGER,
order_date TEXT
);
-- Inner join
SELECT u.name, p.name as product, o.quantity
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN products p ON o.product_id = p.id;
-- Left join (shows all users, even without orders)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
Aggregation¶
-- Count rows
SELECT COUNT(*) FROM users;
-- Group by with aggregation
SELECT category, COUNT(*) as product_count, AVG(price) as avg_price
FROM products
GROUP BY category;
-- Having clause
SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;
Updates and Deletes¶
-- Update single row
UPDATE users SET email = 'newemail@example.com' WHERE id = 1;
-- Update multiple rows
UPDATE products SET price = price * 1.1 WHERE category = 'Electronics';
-- Delete rows
DELETE FROM orders WHERE order_date < '2023-01-01';
Output Formats¶
1. Table Format (Default)¶
Output:
│ id │ name │ email │
├────────┼───────────────────┼──────────────────────────┤
│ I64(1) │ Str("Alice") │ Str("alice@example.com") │
│ I64(2) │ Str("Bob") │ Str("bob@example.com") │
2. JSON Format¶
Output:
[
{
"id": 1,
"name": "Alice",
"email": "alice@example.com"
},
{
"id": 2,
"name": "Bob",
"email": "bob@example.com"
}
]
3. CSV Format¶
Output:
Interactive Mode¶
The interactive SQL shell provides a convenient environment for exploring your data:
🌟 ProllyTree SQL Interactive Shell
====================================
Type 'exit' or 'quit' to exit
Type 'help' for available commands
prolly-sql> CREATE TABLE test (id INTEGER, value TEXT);
✓ Table created successfully
prolly-sql> INSERT INTO test VALUES (1, 'Hello'), (2, 'World');
✓ Inserted 2 rows
prolly-sql> SELECT * FROM test;
│ id │ value │
├────────┼───────────────┤
│ I64(1) │ Str("Hello") │
│ I64(2) │ Str("World") │
prolly-sql> exit
Goodbye!
Interactive Mode with Historical Data¶
Use interactive mode to explore historical data:
🌟 ProllyTree SQL Interactive Shell
====================================
Executing against branch/commit: feature/analytics
⚠️ Only SELECT statements are allowed in this mode
Type 'exit' or 'quit' to exit
Type 'help' for available commands
prolly-sql> SELECT COUNT(*) FROM new_analytics_table;
│ COUNT(*) │
├──────────┤
│ I64(150) │
prolly-sql> SELECT * FROM products WHERE price > 1000;
│ id │ name │ price │
├────────┼─────────────────────┼───────────┤
│ I64(1) │ Str("Gaming PC") │ I64(1500) │
│ I64(2) │ Str("MacBook Pro") │ I64(2000) │
prolly-sql> INSERT INTO products VALUES (3, 'iPad', 800);
Error: Only SELECT statements are allowed when using -b/--branch parameter
Historical commits/branches are read-only for data integrity
prolly-sql> exit
Goodbye!
Restored to original branch: main
Advanced Examples¶
1. Historical Data Analysis¶
Compare data across different points in time:
# Query current data
git-prolly sql "SELECT COUNT(*) as current_users FROM users"
# Query data from last week's commit
git-prolly sql -b 7d1a2b3c "SELECT COUNT(*) as users_last_week FROM users"
# Compare product prices between branches
git-prolly sql -b main "SELECT name, price FROM products WHERE category = 'Electronics'"
git-prolly sql -b feature/price-update "SELECT name, price FROM products WHERE category = 'Electronics'"
# Analyze data growth over time
git-prolly sql -b v1.0 "SELECT COUNT(*) as v1_orders FROM orders"
git-prolly sql -b v2.0 "SELECT COUNT(*) as v2_orders FROM orders"
2. Complex Data Analysis¶
-- Create sales data
CREATE TABLE sales (
id INTEGER,
product_id INTEGER,
customer_id INTEGER,
quantity INTEGER,
price INTEGER,
sale_date TEXT
);
-- Top selling products
SELECT p.name, SUM(s.quantity) as total_sold, SUM(s.quantity * s.price) as revenue
FROM sales s
JOIN products p ON s.product_id = p.id
GROUP BY p.id, p.name
ORDER BY revenue DESC
LIMIT 10;
-- Customer purchase history
SELECT c.name, COUNT(DISTINCT s.id) as purchase_count,
SUM(s.quantity * s.price) as total_spent
FROM customers c
JOIN sales s ON c.id = s.customer_id
GROUP BY c.id, c.name
HAVING total_spent > 1000;
2. Data Migration Script¶
# Create migration script
cat > migrate_v2.sql << EOF
-- Add new columns to existing table
CREATE TABLE users_new (
id INTEGER,
name TEXT,
email TEXT,
created_at TEXT,
updated_at TEXT,
status TEXT
);
-- Copy existing data
INSERT INTO users_new (id, name, email)
SELECT id, name, email FROM users;
-- Update new fields
UPDATE users_new SET
created_at = '2024-01-01',
updated_at = '2024-01-01',
status = 'active';
-- Verify migration
SELECT COUNT(*) as migrated_count FROM users_new;
EOF
# Execute migration
git-prolly sql -f migrate_v2.sql
3. Reporting and Analytics¶
# Generate daily report
git-prolly sql -o json "
SELECT
DATE(order_date) as date,
COUNT(*) as orders,
SUM(quantity * price) as revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY DATE(order_date)
ORDER BY date
" > daily_revenue.json
# Export customer list
git-prolly sql -o csv "
SELECT id, name, email, created_at
FROM users
WHERE status = 'active'
ORDER BY name
" > active_customers.csv
Git Integration¶
Version Control for Your Data¶
All SQL operations are stored in ProllyTree and can be versioned with Git:
# Make changes
git-prolly sql "INSERT INTO users VALUES (4, 'David', 'david@example.com')"
git-prolly sql "UPDATE products SET price = 1100 WHERE id = 1"
# Check status
git-prolly status
# Commit changes
git-prolly commit -m "Added new user David and updated laptop price"
# View history
git-prolly show HEAD
# Diff between commits
git-prolly diff HEAD~1 HEAD
Working with Branches¶
# Create a new branch for experimental changes
git checkout -b feature/new-schema
# Make schema changes
git-prolly sql "CREATE TABLE categories (id INTEGER, name TEXT)"
git-prolly sql "ALTER TABLE products ADD COLUMN category_id INTEGER"
# Commit on feature branch
git-prolly commit -m "Added categories support"
# Switch back to main
git checkout main
# The new tables don't exist on main branch
git-prolly sql "SELECT * FROM categories" # Error: table not found
# Query the new schema without switching branches
git-prolly sql -b feature/new-schema "SELECT * FROM categories"
# Merge when ready
git merge feature/new-schema
Cross-Branch Data Comparison¶
Compare data between branches without switching contexts:
# Compare user counts between branches
echo "Main branch users:"
git-prolly sql -b main "SELECT COUNT(*) FROM users"
echo "Feature branch users:"
git-prolly sql -b feature/user-management "SELECT COUNT(*) FROM users"
# Generate reports from different branches
git-prolly sql -b production -o json "SELECT * FROM daily_metrics WHERE date = '2024-01-15'" > prod_metrics.json
git-prolly sql -b staging -o json "SELECT * FROM daily_metrics WHERE date = '2024-01-15'" > staging_metrics.json
# Compare table schemas between versions
git-prolly sql -b v1.0 "SELECT name FROM sqlite_master WHERE type='table'"
git-prolly sql -b v2.0 "SELECT name FROM sqlite_master WHERE type='table'"
Best Practices¶
1. Schema Design¶
- Use meaningful table and column names
- Define appropriate data types for columns
- Consider adding indexes for frequently queried columns (when supported)
2. Query Optimization¶
- Use specific column names instead of
SELECT *when possible - Add appropriate WHERE clauses to limit result sets
- Use JOINs efficiently - avoid cartesian products
3. Data Integrity¶
- Commit your changes regularly with meaningful messages
- Test queries in interactive mode before running in scripts
- Use transactions for multi-step operations (when supported)
4. Migration Strategy¶
# Always backup before major changes
git-prolly sql "SELECT * FROM important_table" -o json > backup.json
# Test migrations on a branch first
git checkout -b migration-test
# ... run migration ...
# If successful, merge to main
5. Historical Data Querying¶
- Commit changes before using
-b: Always commit your staging changes before querying historical data - Use for read-only analysis: The
-bparameter is perfect for generating reports without affecting current work - Branch-specific schemas: Use
-bto query data from branches with different table structures - Performance: Historical queries access committed data, so they may be slower than current branch queries
# Good practice: commit first
git-prolly commit -m "Save current work"
git-prolly sql -b production "SELECT * FROM metrics"
# Avoid: Don't leave uncommitted changes
git-prolly set user:new "data" # Uncommitted change
git-prolly sql -b main "SELECT * FROM users" # Will be blocked
Troubleshooting¶
Common Issues¶
- "Failed to open dataset"
- Ensure you're in a git-prolly initialized directory
-
Run
git-prolly initif needed -
"Table not found"
- Check table name spelling
- Ensure the table was created and changes were committed
-
Use
git-prolly listto see all keys (tables are stored with:__schema__suffix) -
Query errors
- Use
--verboseflag for detailed error messages - Check SQL syntax - the parser is strict about formatting
-
Ensure column names match exactly (case-sensitive)
-
"Cannot use -b/--branch parameter with uncommitted staging changes"
- Check staging status with
git-prolly status - Commit your changes first:
git-prolly commit -m "Save changes" -
Or discard changes if not needed
-
"Only SELECT statements are allowed when using -b/--branch parameter"
- Historical data is read-only for safety
- Use regular
git-prolly sql(without-b) for data modifications -
Switch to the target branch if you need to make changes there
-
"Failed to checkout branch/commit"
- Verify the branch/commit exists:
git branch -aorgit log --oneline - Check branch name spelling (case-sensitive)
- Ensure you have access to the specified commit
Performance Tips¶
-
Large Result Sets: Use LIMIT to restrict output
-
Complex Queries: Break them into steps
-
Bulk Operations: Use file-based execution
Debug Mode¶
Use verbose mode for debugging:
This will show: - Detailed error messages - Query execution time - Additional context for troubleshooting
Limitations¶
Current limitations (may be addressed in future versions):
- No ALTER TABLE - Create new table and migrate data instead
- Limited aggregate functions - SUM() may have issues with empty groups
- No DISTINCT in some contexts - Use GROUP BY as workaround
- No subqueries in some positions - Use JOINs or temporary tables
- Case-sensitive identifiers - Table and column names are case-sensitive
Conclusion¶
The git-prolly sql command brings the power of SQL to ProllyTree's versioned storage, enabling complex data analysis while maintaining Git's version control benefits. This unique combination allows you to:
- Query your data with familiar SQL syntax
- Version control your data changes
- Branch and merge data schemas
- Track data history over time
- Export data in multiple formats
For more examples and advanced usage, see the examples/sql_example.rs file in the repository.