Five-Minute Tutorial: Build a Mini RAG System
This tutorial builds a retrieval-augmented generation (RAG) knowledge base using all three engines. By the end, you will have documents stored relationally, linked in a graph, searchable by embeddings, and protected with vault secrets.
Prerequisites
Start the shell with persistence:
neumann --wal-dir ./rag-data
Step 1: Create the Document Store
Use a relational table for structured document metadata:
CREATE TABLE documents (
id INT PRIMARY KEY,
title TEXT NOT NULL,
category TEXT,
author TEXT,
created TEXT
);
INSERT INTO documents VALUES (1, 'Intro to Neural Networks', 'ml', 'Alice', '2024-01-15');
INSERT INTO documents VALUES (2, 'Transformer Architecture', 'ml', 'Bob', '2024-02-20');
INSERT INTO documents VALUES (3, 'Database Indexing', 'systems', 'Carol', '2024-03-10');
INSERT INTO documents VALUES (4, 'Vector Search at Scale', 'systems', 'Alice', '2024-04-05');
INSERT INTO documents VALUES (5, 'Fine-Tuning LLMs', 'ml', 'Dave', '2024-05-12');
INSERT INTO documents VALUES (6, 'Consensus Protocols', 'distributed', 'Eve', '2024-06-01');
Verify:
SELECT * FROM documents ORDER BY id;
SELECT category, COUNT(*) FROM documents GROUP BY category;
Step 2: Add Graph Relationships
Create nodes for documents and topics, then link them:
NODE CREATE document { title: 'Intro to Neural Networks', doc_id: 1 }
NODE CREATE document { title: 'Transformer Architecture', doc_id: 2 }
NODE CREATE document { title: 'Database Indexing', doc_id: 3 }
NODE CREATE document { title: 'Vector Search at Scale', doc_id: 4 }
NODE CREATE document { title: 'Fine-Tuning LLMs', doc_id: 5 }
NODE CREATE document { title: 'Consensus Protocols', doc_id: 6 }
NODE CREATE topic { name: 'machine-learning' }
NODE CREATE topic { name: 'systems' }
NODE CREATE topic { name: 'distributed-systems' }
List nodes to get IDs:
NODE LIST document
NODE LIST topic
Create edges linking documents to topics (use actual IDs from NODE LIST):
-- Documents 1, 2, 5 cover machine-learning
-- Documents 3, 4 cover systems
-- Documents 4, 6 cover distributed-systems
-- Document 2 references document 1
-- Document 5 references documents 1 and 2
Create citation relationships between documents:
EDGE CREATE 'doc2-id' -> 'doc1-id' : cites
EDGE CREATE 'doc5-id' -> 'doc1-id' : cites
EDGE CREATE 'doc5-id' -> 'doc2-id' : cites
Check the graph:
NEIGHBORS 'doc1-id' INCOMING : cites
PATH SHORTEST 'doc5-id' TO 'doc1-id'
Step 3: Store Embeddings
Each document gets a vector representing its content. In a real system, you would generate these with an embedding model (e.g., OpenAI, Cohere, or a local model). Here we use hand-crafted 6-dimensional vectors:
-- [neural-nets, transformers, databases, vectors, llms, distributed]
EMBED STORE 'doc-1' [0.9, 0.3, 0.1, 0.2, 0.2, 0.1]
EMBED STORE 'doc-2' [0.7, 0.95, 0.1, 0.1, 0.4, 0.1]
EMBED STORE 'doc-3' [0.1, 0.1, 0.95, 0.3, 0.0, 0.2]
EMBED STORE 'doc-4' [0.2, 0.1, 0.5, 0.9, 0.1, 0.4]
EMBED STORE 'doc-5' [0.6, 0.5, 0.1, 0.1, 0.95, 0.1]
EMBED STORE 'doc-6' [0.1, 0.1, 0.3, 0.2, 0.1, 0.9]
Search by similarity:
-- Find documents similar to "Intro to Neural Networks"
SIMILAR 'doc-1' LIMIT 3
-- Search with a custom query vector (someone asking about transformers + LLMs)
SIMILAR [0.5, 0.8, 0.1, 0.1, 0.7, 0.1] LIMIT 3 METRIC COSINE
Step 4: Graph-Aware Semantic Search
Combine vector similarity with graph connectivity. Find documents similar to a query vector that are also connected to a specific topic node:
SIMILAR [0.8, 0.6, 0.1, 0.1, 0.5, 0.1] LIMIT 3 CONNECTED TO 'ml-topic-id'
This is the core RAG pattern: retrieve relevant documents using embedding similarity, then filter by graph relationships for context-aware results.
Step 5: Protect API Keys with Vault
Store the embedding API key securely:
VAULT SET 'openai_api_key' 'sk-proj-abc123...'
VAULT SET 'cohere_api_key' 'co-xyz789...'
VAULT GRANT 'alice' ON 'openai_api_key'
Retrieve when needed:
VAULT GET 'openai_api_key'
VAULT LIST
Step 6: Cache LLM Responses
Initialize the cache and store responses to avoid repeated API calls:
CACHE INIT
CACHE PUT 'what are transformers?' 'Transformers are a neural network architecture based on self-attention mechanisms...'
CACHE SEMANTIC PUT 'explain attention mechanism' 'The attention mechanism allows models to focus on relevant parts of the input...' EMBEDDING [0.6, 0.9, 0.1, 0.1, 0.3, 0.1]
On subsequent queries, check the cache first:
CACHE GET 'what are transformers?'
CACHE SEMANTIC GET 'how does attention work?' THRESHOLD 0.8
Step 7: Checkpoint
Save your work:
CHECKPOINT 'rag-setup-complete'
CHECKPOINTS
What You Built
You now have a working RAG knowledge base with:
- Structured metadata in a relational table (searchable with SQL)
- Semantic relationships in a graph (topics, citations, authorship)
- Vector embeddings for similarity search
- Graph-aware retrieval combining similarity and structure
- Encrypted secrets for API key management
- Response caching to reduce LLM API costs
- Checkpoint for safe rollback
Next Steps
- Use Cases – More application patterns
- Query Language Reference – All available commands
- Python SDK – Build this in Python
- Architecture Overview – How the engines work together