Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Neumann Parser

The neumann_parser crate provides a hand-written recursive descent parser for the Neumann unified query language. It converts source text into an Abstract Syntax Tree (AST) that can be executed by the query router.

The parser is designed with zero external dependencies, full span tracking for error reporting, and support for SQL, graph, vector, and domain-specific operations in a single unified syntax.

Key Concepts

ConceptDescription
Recursive DescentTop-down parsing where each grammar rule becomes a function
Pratt ParsingOperator precedence parsing for expressions with correct associativity
Span TrackingEvery AST node carries source location for error messages
Case InsensitivityKeywords are matched case-insensitively via uppercase conversion
Single LookaheadParser uses one-token lookahead with optional peek
Depth LimitingExpression nesting is limited to 64 levels to prevent stack overflow

Architecture

flowchart LR
    subgraph Input
        Source["Source String"]
    end

    subgraph Lexer
        Chars["char iterator"] --> Tokenizer
        Tokenizer --> Tokens["Token Stream"]
    end

    subgraph Parser
        Tokens --> StatementParser["Statement Parser"]
        StatementParser --> ExprParser["Expression Parser (Pratt)"]
        ExprParser --> AST["Abstract Syntax Tree"]
    end

    Source --> Chars
    AST --> Output["Statement + Span"]

Detailed Parsing Flow

sequenceDiagram
    participant User
    participant parse()
    participant Parser
    participant Lexer
    participant ExprParser

    User->>parse(): "SELECT * FROM users"
    parse()->>Parser: new(source)
    Parser->>Lexer: new(source)
    Lexer-->>Parser: first token

    Parser->>Parser: parse_statement()
    Parser->>Parser: match on token kind
    Parser->>Parser: parse_select()
    Parser->>Parser: parse_select_body()

    loop For each select item
        Parser->>ExprParser: parse_expr()
        ExprParser->>ExprParser: parse_expr_bp(0)
        ExprParser->>ExprParser: parse_prefix_expr()
        ExprParser-->>Parser: Expr with span
    end

    Parser-->>parse(): Statement
    parse()-->>User: Result<Statement>

Source Files

FilePurposeKey Functions
lib.rsPublic API exportsparse(), parse_all(), parse_expr(), tokenize()
lexer.rsTokenization (source to tokens)Lexer::next_token(), scan_ident(), scan_number(), scan_string()
token.rsToken definitions and keyword lookupTokenKind, keyword_from_str()
parser.rsStatement parsing (recursive descent)Parser::parse_statement(), parse_select(), parse_insert()
expr.rsExpression parsing (Pratt algorithm)ExprParser::parse_expr(), parse_expr_bp(), infix_binding_power()
ast.rsAST node definitionsStatement, StatementKind, Expr, ExprKind
span.rsSource location trackingBytePos, Span, line_col(), get_line()
error.rsError types with source contextParseError, ParseErrorKind, format_with_source()

Core Types

Token System

classDiagram
    class Token {
        +TokenKind kind
        +Span span
        +is_eof() bool
        +is_keyword() bool
    }

    class TokenKind {
        <<enumeration>>
        Ident(String)
        Integer(i64)
        Float(f64)
        String(String)
        Select
        From
        Where
        ...
        Error(String)
        Eof
    }

    class Span {
        +BytePos start
        +BytePos end
        +len() u32
        +merge(Span) Span
        +extract(str) str
    }

    Token --> TokenKind
    Token --> Span
TypeDescription
TokenA token with its kind and span
TokenKindEnum of all token variants (130+ variants including keywords, literals, operators)
LexerStateful tokenizer that produces tokens from source

AST Structure

classDiagram
    class Statement {
        +StatementKind kind
        +Span span
    }

    class StatementKind {
        <<enumeration>>
        Select(SelectStmt)
        Insert(InsertStmt)
        Node(NodeStmt)
        Edge(EdgeStmt)
        Similar(SimilarStmt)
        Vault(VaultStmt)
        ...
    }

    class Expr {
        +ExprKind kind
        +Span span
        +boxed() Box~Expr~
    }

    class ExprKind {
        <<enumeration>>
        Literal(Literal)
        Ident(Ident)
        Binary(Box~Expr~, BinaryOp, Box~Expr~)
        Unary(UnaryOp, Box~Expr~)
        Call(FunctionCall)
        ...
    }

    Statement --> StatementKind
    StatementKind --> Expr
    Expr --> ExprKind
TypeDescription
StatementTop-level parsed statement with span
StatementKindEnum of all statement variants (30+ variants)
ExprExpression node with span
ExprKindEnum of expression variants (20+ variants)
LiteralLiteral values (Null, Boolean, Integer, Float, String)
IdentIdentifier with name and span
BinaryOpBinary operators with precedence (18 operators)
UnaryOpUnary operators (Not, Neg, BitNot)

Span Types

TypeDescriptionExample
BytePosA byte offset into source text (u32)BytePos(7)
SpanA range of bytes (start, end)Span { start: 0, end: 6 }
Spanned<T>A value paired with its source locationSpanned::new(42, span)
#![allow(unused)]
fn main() {
// Span operations
let span1 = Span::from_offsets(0, 6);   // "SELECT"
let span2 = Span::from_offsets(7, 8);   // "*"
let merged = span1.merge(span2);         // "SELECT *"

// Extract source text
let source = "SELECT * FROM users";
let text = span1.extract(source);        // "SELECT"

// Line/column computation
let (line, col) = line_col(source, BytePos(7));  // (1, 8)
}

Error Types

TypeDescription
ParseErrorError with kind, span, and optional help message
ParseErrorKindEnum of error variants (10 kinds)
ParseResult<T>Result<T, ParseError>
ErrorsCollection of parse errors with iteration support
#![allow(unused)]
fn main() {
// Error kinds
pub enum ParseErrorKind {
    UnexpectedToken { found: TokenKind, expected: String },
    UnexpectedEof { expected: String },
    InvalidSyntax(String),
    InvalidNumber(String),
    UnterminatedString,
    UnknownCommand(String),
    DuplicateColumn(String),
    InvalidEscape(char),
    TooDeep,           // Expression nesting > 64 levels
    Custom(String),
}
}

Lexer Implementation

State Machine

The lexer is implemented as an iterator-based state machine with single-character lookahead:

stateDiagram-v2
    [*] --> Initial
    Initial --> Whitespace: is_whitespace
    Initial --> LineComment: --
    Initial --> BlockComment: /*
    Initial --> Identifier: a-zA-Z_
    Initial --> Number: 0-9
    Initial --> String: ' or "
    Initial --> Operator: +-*/etc
    Initial --> EOF: end

    Whitespace --> Initial: skip
    LineComment --> Initial: newline
    BlockComment --> Initial: */

    Identifier --> Token: non-alnum
    Number --> Token: non-digit
    String --> Token: closing quote
    Operator --> Token: complete

    Token --> Initial: emit token
    EOF --> [*]

Internal Structure

#![allow(unused)]
fn main() {
pub struct Lexer<'a> {
    source: &'a str,      // Original source text
    chars: Chars<'a>,     // Character iterator
    pos: u32,             // Current byte position
    peeked: Option<char>, // One-character lookahead
}
}

Character Classification

CategoryCharactersHandling
Whitespacespace, tab, newlineSkipped
Line comment-- to newlineSkipped
Block comment/* */ (nestable)Skipped, supports nesting
Identifier[a-zA-Z_][a-zA-Z0-9_]*Keyword lookup then Ident
Integer[0-9]+Parse as i64
Float[0-9]+\.[0-9]+ or scientificParse as f64
String'...' or "..."Handle escapes

String Escape Sequences

EscapeResult
\nNewline
\rCarriage return
\tTab
\\Backslash
\'Single quote
\"Double quote
\0Null character
''Single quote (SQL-style doubled)
\xUnknown: preserved as \x

Operator Recognition

Multi-character operators are recognized with lookahead:

#![allow(unused)]
fn main() {
// Lexer::next_token() operator matching
match c {
    '-' => if self.eat('>') { Arrow }      // ->
           else { Minus },                  // -
    '=' => if self.eat('>') { FatArrow }   // =>
           else { Eq },                     // =
    '<' => if self.eat('=') { Le }         // <=
           else if self.eat('>') { Ne }    // <>
           else if self.eat('<') { Shl }   // <<
           else { Lt },                     // <
    '>' => if self.eat('=') { Ge }         // >=
           else if self.eat('>') { Shr }   // >>
           else { Gt },                     // >
    '|' => if self.eat('|') { Concat }     // ||
           else { Pipe },                   // |
    '&' => if self.eat('&') { AmpAmp }     // &&
           else { Amp },                    // &
    ':' => if self.eat(':') { ColonColon } // ::
           else { Colon },                  // :
    // ...
}
}

Pratt Parser (Expression Parsing)

Algorithm Overview

The Pratt parser handles operator precedence through “binding power” - each operator has a left and right binding power that determines associativity and precedence.

flowchart TD
    A[parse_expr_bp\nmin_bp] --> B[parse_prefix]
    B --> C{More tokens?}
    C -->|No| D[Return lhs]
    C -->|Yes| E[parse_postfix]
    E --> F{Infix op?}
    F -->|No| D
    F -->|Yes| G{l_bp >= min_bp?}
    G -->|No| D
    G -->|Yes| H[Advance]
    H --> I[parse_expr_bp\nr_bp]
    I --> J[Build Binary node]
    J --> C

Binding Power Table

Each operator has left and right binding powers (l_bp, r_bp):

PrecedenceOperatorsBinding Power (l, r)Associativity
1 (lowest)OR(1, 2)Left
2AND(3, 4)Left
3=, !=, <, <=, >, >=(5, 6)Left
4| (bitwise OR)(7, 8)Left
5^ (bitwise XOR)(9, 10)Left
6& (bitwise AND)(11, 12)Left
7<<, >>(13, 14)Left
8+, -, || (concat)(15, 16)Left
9*, /, %(17, 18)Left
10 (highest)NOT, -, ~ (unary)19 (prefix)Right

Left associativity is achieved by having r_bp > l_bp.

Implementation Details

#![allow(unused)]
fn main() {
const MAX_DEPTH: usize = 64;  // Prevent stack overflow

fn parse_expr_bp(&mut self, min_bp: u8) -> ParseResult<Expr> {
    self.depth += 1;
    if self.depth > MAX_DEPTH {
        return Err(ParseError::new(ParseErrorKind::TooDeep, self.current.span));
    }

    let mut lhs = self.parse_prefix()?;

    loop {
        // Handle postfix operators (IS NULL, IN, BETWEEN, LIKE, DOT)
        lhs = self.parse_postfix(lhs)?;

        // Check for infix operator
        let op = match self.current_binary_op() {
            Some(op) => op,
            None => break,
        };

        let (l_bp, r_bp) = infix_binding_power(op);
        if l_bp < min_bp {
            break;  // Operator binds less tightly than current context
        }

        self.advance();
        let rhs = self.parse_expr_bp(r_bp)?;

        let span = lhs.span.merge(rhs.span);
        lhs = Expr::new(ExprKind::Binary(Box::new(lhs), op, Box::new(rhs)), span);
    }

    self.depth -= 1;
    Ok(lhs)
}
}

Prefix Expression Handling

#![allow(unused)]
fn main() {
fn parse_prefix(&mut self) -> ParseResult<Expr> {
    match &self.current.kind {
        // Literals
        TokenKind::Integer(n) => { /* emit Literal(Integer) */ },
        TokenKind::Float(n) => { /* emit Literal(Float) */ },
        TokenKind::String(s) => { /* emit Literal(String) */ },
        TokenKind::True | TokenKind::False => { /* emit Literal(Boolean) */ },
        TokenKind::Null => { /* emit Literal(Null) */ },

        // Identifiers and function calls
        TokenKind::Ident(_) => self.parse_ident_or_call(),

        // Aggregate functions (COUNT, SUM, AVG, MIN, MAX)
        TokenKind::Count | TokenKind::Sum | ... => self.parse_aggregate_call(),

        // Wildcard
        TokenKind::Star => { /* emit Wildcard */ },

        // Parenthesized expression or tuple
        TokenKind::LParen => self.parse_paren_expr(),

        // Array literal
        TokenKind::LBracket => self.parse_array(),

        // Unary operators
        TokenKind::Minus => { /* parse operand with PREFIX_BP, emit Unary(Neg) */ },
        TokenKind::Not | TokenKind::Bang => { /* emit Unary(Not) */ },
        TokenKind::Tilde => { /* emit Unary(BitNot) */ },

        // Special expressions
        TokenKind::Case => self.parse_case(),
        TokenKind::Exists => self.parse_exists(),
        TokenKind::Cast => self.parse_cast(),

        // Contextual keywords as identifiers
        _ if token.kind.is_contextual_keyword() => self.parse_keyword_as_ident(),

        // Error
        _ => Err(ParseError::unexpected(...)),
    }
}
}

Postfix Expression Handling

Postfix operators bind tighter than any infix operator:

#![allow(unused)]
fn main() {
fn parse_postfix(&mut self, mut expr: Expr) -> ParseResult<Expr> {
    loop {
        // Handle NOT IN, NOT BETWEEN, NOT LIKE
        if self.check(&TokenKind::Not) {
            let next = self.peek().kind.clone();
            if next == TokenKind::In { /* parse NOT IN */ }
            else if next == TokenKind::Between { /* parse NOT BETWEEN */ }
            else if next == TokenKind::Like { /* parse NOT LIKE */ }
        }

        match self.current.kind {
            TokenKind::Is => {
                // IS [NOT] NULL
                self.advance();
                let negated = self.eat(&TokenKind::Not);
                self.expect(&TokenKind::Null)?;
                expr = Expr::new(ExprKind::IsNull { expr, negated }, span);
            },
            TokenKind::In => { /* parse IN (values) or IN (subquery) */ },
            TokenKind::Between => { /* parse BETWEEN low AND high */ },
            TokenKind::Like => { /* parse LIKE pattern */ },
            TokenKind::Dot => {
                // Qualified name: table.column or table.*
                self.advance();
                if self.eat(&TokenKind::Star) {
                    expr = Expr::new(ExprKind::QualifiedWildcard(ident), span);
                } else {
                    let field = self.expect_ident()?;
                    expr = Expr::new(ExprKind::Qualified(Box::new(expr), field), span);
                }
            },
            _ => return Ok(expr),
        }
    }
}
}

Statement Kinds

SQL Statements

StatementExampleAST Type
SelectSELECT * FROM users WHERE id = 1SelectStmt
InsertINSERT INTO users (name) VALUES ('Alice')InsertStmt
UpdateUPDATE users SET name = 'Bob' WHERE id = 1UpdateStmt
DeleteDELETE FROM users WHERE id = 1DeleteStmt
CreateTableCREATE TABLE users (id INT PRIMARY KEY)CreateTableStmt
DropTableDROP TABLE IF EXISTS users CASCADEDropTableStmt
CreateIndexCREATE UNIQUE INDEX idx ON users(email)CreateIndexStmt
DropIndexDROP INDEX idx_nameDropIndexStmt
ShowTablesSHOW TABLESunit
DescribeDESCRIBE TABLE usersDescribeStmt

Graph Statements

StatementExampleAST Type
NodeNODE CREATE person {name: 'Alice'}NodeStmt
EdgeEDGE CREATE 1 -> 2 : FOLLOWS {since: 2023}EdgeStmt
NeighborsNEIGHBORS 'entity' OUTGOING followsNeighborsStmt
PathPATH SHORTEST 1 TO 5PathStmt
FindFIND NODE person WHERE age > 18FindStmt

Vector Statements

StatementExampleAST Type
EmbedEMBED STORE 'key' [0.1, 0.2, 0.3]EmbedStmt
SimilarSIMILAR 'query' LIMIT 10 COSINESimilarStmt

Domain Statements

StatementExampleAST Type
VaultVAULT SET 'secret' 'value'VaultStmt
CacheCACHE STATSCacheStmt
BlobBLOB PUT 'file.txt' 'data'BlobStmt
BlobsBLOBS BY TAG 'important'BlobsStmt
ChainBEGIN CHAIN TRANSACTIONChainStmt
ClusterCLUSTER STATUSClusterStmt
CheckpointCHECKPOINT 'backup1'CheckpointStmt
EntityENTITY CREATE 'key' {props} EMBEDDING [vec]EntityStmt

Expression Kinds

KindExampleNotes
Literal42, 3.14, 'hello', TRUE, NULLFive literal types
Identcolumn_nameSimple identifier
Qualifiedtable.columnDot notation
Binarya + b, x AND y18 binary operators
UnaryNOT flag, -value, ~bits3 unary operators
CallCOUNT(*), MAX(price)Function with args
CaseCASE WHEN x THEN y ELSE z ENDSimple and searched
Subquery(SELECT ...)Nested SELECT
ExistsEXISTS (SELECT 1 ...)Existence test
Inx IN (1, 2, 3) or x IN (SELECT...)Value list or subquery
Betweenx BETWEEN 1 AND 10Range check
Likename LIKE '%smith'Pattern matching
IsNullx IS NULL, y IS NOT NULLNULL check
Array[1, 2, 3]Array literal
Tuple(1, 2, 3)Tuple/row literal
CastCAST(x AS INT)Type conversion
Wildcard*All columns
QualifiedWildcardtable.*All columns from table

Span Tracking Implementation

BytePos and Span

#![allow(unused)]
fn main() {
/// A byte position in source code (u32 for memory efficiency).
pub struct BytePos(pub u32);

/// A span representing a range of source code.
pub struct Span {
    pub start: BytePos,  // Inclusive
    pub end: BytePos,    // Exclusive
}

impl Span {
    /// Combines two spans into one that covers both.
    pub const fn merge(self, other: Span) -> Span {
        let start = if self.start.0 < other.start.0 { self.start } else { other.start };
        let end = if self.end.0 > other.end.0 { self.end } else { other.end };
        Span { start, end }
    }
}
}

Line/Column Calculation

#![allow(unused)]
fn main() {
/// Computes line and column from a byte position.
pub fn line_col(source: &str, pos: BytePos) -> (usize, usize) {
    let offset = pos.as_usize().min(source.len());
    let mut line = 1;
    let mut col = 1;

    for (i, ch) in source.char_indices() {
        if i >= offset { break; }
        if ch == '\n' {
            line += 1;
            col = 1;
        } else {
            col += 1;
        }
    }

    (line, col)
}

/// Returns the line containing a position.
pub fn get_line(source: &str, pos: BytePos) -> &str {
    let offset = pos.as_usize().min(source.len());
    let line_start = source[..offset].rfind('\n').map(|i| i + 1).unwrap_or(0);
    let line_end = source[offset..].find('\n').map(|i| offset + i).unwrap_or(source.len());
    &source[line_start..line_end]
}
}

Error Recovery

Error Creation Patterns

#![allow(unused)]
fn main() {
// Unexpected token
ParseError::unexpected(
    TokenKind::Comma,
    self.current.span,
    "column name"
)

// Unexpected EOF
ParseError::unexpected_eof(
    self.current.span,
    "expression"
)

// Invalid syntax
ParseError::invalid(
    "CASE requires at least one WHEN clause",
    self.current.span
)

// With help message
ParseError::invalid("unknown keyword SELCT", span)
    .with_help("did you mean SELECT?")
}

Error Formatting

#![allow(unused)]
fn main() {
pub fn format_with_source(&self, source: &str) -> String {
    let (line, col) = line_col(source, self.span.start);
    let line_text = get_line(source, self.span.start);

    // Build error message with source context
    let mut result = format!("error: {}\n", self.kind);
    result.push_str(&format!("  --> line {}:{}\n", line, col));
    result.push_str("   |\n");
    result.push_str(&format!("{:3} | {}\n", line, line_text));
    result.push_str("   | ");

    // Add carets under the error location
    for _ in 0..(col - 1) { result.push(' '); }
    let len = self.span.len().max(1) as usize;
    for _ in 0..len.min(line_text.len() - col + 1).max(1) {
        result.push('^');
    }
    result.push('\n');

    if let Some(help) = &self.help {
        result.push_str(&format!("   = help: {}\n", help));
    }

    result
}
}

Example Error Output

error: unexpected FROM, expected expression or '*' after SELECT
  --> line 1:8
   |
  1 | SELECT FROM users
   |        ^^^^

Usage Examples

Parse a Statement

#![allow(unused)]
fn main() {
use neumann_parser::parse;

let stmt = parse("SELECT * FROM users WHERE id = 1")?;

match &stmt.kind {
    StatementKind::Select(select) => {
        println!("Distinct: {}", select.distinct);
        println!("Columns: {}", select.columns.len());
        if let Some(from) = &select.from {
            println!("Table: {:?}", from.table.kind);
        }
        if let Some(where_clause) = &select.where_clause {
            println!("Has WHERE clause");
        }
    }
    _ => {}
}
}

Parse Multiple Statements

#![allow(unused)]
fn main() {
use neumann_parser::parse_all;

let stmts = parse_all("SELECT 1; SELECT 2; INSERT INTO t VALUES (1)")?;
assert_eq!(stmts.len(), 3);

for stmt in &stmts {
    println!("Statement at {:?}", stmt.span);
}
}

Parse an Expression

#![allow(unused)]
fn main() {
use neumann_parser::parse_expr;

let expr = parse_expr("1 + 2 * 3")?;
// Parses as: 1 + (2 * 3) due to precedence

if let ExprKind::Binary(lhs, BinaryOp::Add, rhs) = expr.kind {
    // lhs = Literal(1)
    // rhs = Binary(Literal(2), Mul, Literal(3))
}
}

Tokenize Source

#![allow(unused)]
fn main() {
use neumann_parser::tokenize;

let tokens = tokenize("SELECT * FROM users");
for token in tokens {
    println!("{:?} at {:?}", token.kind, token.span);
}

// Output:
// Select at 0..6
// Star at 7..8
// From at 9..13
// Ident("users") at 14..19
// Eof at 19..19
}

Working with the Parser Directly

#![allow(unused)]
fn main() {
use neumann_parser::Parser;

let mut parser = Parser::new("SELECT 1; SELECT 2");

// Parse first statement
let stmt1 = parser.parse_statement()?;
assert!(matches!(stmt1.kind, StatementKind::Select(_)));

// Parse second statement
let stmt2 = parser.parse_statement()?;
assert!(matches!(stmt2.kind, StatementKind::Select(_)));

// Third call returns Empty (EOF)
let stmt3 = parser.parse_statement()?;
assert!(matches!(stmt3.kind, StatementKind::Empty));
}

Error Handling

#![allow(unused)]
fn main() {
use neumann_parser::parse;

let result = parse("SELECT FROM users");
if let Err(e) = result {
    // Format error with source context
    let formatted = e.format_with_source("SELECT FROM users");
    println!("{}", formatted);

    // Access error details
    println!("Error kind: {:?}", e.kind);
    println!("Span: {:?}", e.span);
    if let Some(help) = &e.help {
        println!("Help: {}", help);
    }
}
}

Grammar Overview

SELECT Statement

SELECT [DISTINCT | ALL] columns
FROM table [alias]
[JOIN table ON condition | USING (cols)]...
[WHERE condition]
[GROUP BY exprs]
[HAVING condition]
[ORDER BY expr [ASC|DESC] [NULLS FIRST|LAST]]...
[LIMIT n]
[OFFSET n]

CREATE TABLE Statement

CREATE TABLE [IF NOT EXISTS] name (
    column type [NULL|NOT NULL] [PRIMARY KEY] [UNIQUE]
                [DEFAULT expr] [CHECK(expr)]
                [REFERENCES table(col) [ON DELETE action] [ON UPDATE action]]
    [, ...]
    [, PRIMARY KEY (cols)]
    [, UNIQUE (cols)]
    [, FOREIGN KEY (cols) REFERENCES table(cols)]
    [, CHECK (expr)]
)

Graph Operations

NODE CREATE label {properties}
NODE GET id
NODE DELETE id
NODE LIST [label]

EDGE CREATE from -> to : type {properties}
EDGE GET id
EDGE DELETE id
EDGE LIST [type]

NEIGHBORS node [OUTGOING|INCOMING|BOTH] [edge_type]
          [BY SIMILAR [vector] LIMIT n]

PATH [SHORTEST|ALL] from TO to [MAX depth]

Vector Operations

EMBED STORE key [vector]
EMBED GET key
EMBED DELETE key
EMBED BUILD INDEX
EMBED BATCH [(key, [vector]), ...]

SIMILAR key|[vector] [LIMIT k] [COSINE|EUCLIDEAN|DOT_PRODUCT]
        [CONNECTED TO entity]

Chain Operations

BEGIN CHAIN TRANSACTION
COMMIT CHAIN
ROLLBACK CHAIN TO height

CHAIN HEIGHT
CHAIN TIP
CHAIN BLOCK height
CHAIN VERIFY
CHAIN HISTORY key
CHAIN SIMILAR [embedding] LIMIT n
CHAIN DRIFT FROM height TO height

SHOW CODEBOOK GLOBAL
SHOW CODEBOOK LOCAL domain
ANALYZE CODEBOOK TRANSITIONS

Reserved Keywords

Keywords are case-insensitive. The lexer converts to uppercase for matching.

SQL (70+ keywords): SELECT, DISTINCT, ALL, FROM, WHERE, INSERT, INTO, VALUES, UPDATE, SET, DELETE, CREATE, DROP, TABLE, INDEX, AND, OR, NOT, NULL, IS, IN, LIKE, BETWEEN, ORDER, BY, ASC, DESC, NULLS, FIRST, LAST, LIMIT, OFFSET, GROUP, HAVING, JOIN, INNER, LEFT, RIGHT, FULL, OUTER, CROSS, NATURAL, ON, USING, AS, PRIMARY, KEY, UNIQUE, REFERENCES, FOREIGN, CHECK, DEFAULT, CASCADE, RESTRICT, IF, EXISTS, SHOW, TABLES, UNION, INTERSECT, EXCEPT, CASE, WHEN, THEN, ELSE, END, CAST, ANY

Types (16 keywords): INT, INTEGER, BIGINT, SMALLINT, FLOAT, DOUBLE, REAL, DECIMAL, NUMERIC, VARCHAR, CHAR, TEXT, BOOLEAN, DATE, TIME, TIMESTAMP

Aggregates (5 keywords): COUNT, SUM, AVG, MIN, MAX

Graph (16 keywords): NODE, EDGE, NEIGHBORS, PATH, GET, LIST, STORE, OUTGOING, INCOMING, BOTH, SHORTEST, PROPERTIES, LABEL, VERTEX, VERTICES, EDGES

Vector (10 keywords): EMBED, SIMILAR, VECTOR, EMBEDDING, DIMENSION, DISTANCE, COSINE, EUCLIDEAN, DOT_PRODUCT, BUILD

Unified (6 keywords): FIND, WITH, RETURN, MATCH, ENTITY, CONNECTED

Domain (30+ keywords): VAULT, GRANT, REVOKE, ROTATE, CACHE, INIT, STATS, CLEAR, EVICT, PUT, SEMANTIC, THRESHOLD, CHECKPOINT, ROLLBACK, CHAIN, BEGIN, COMMIT, TRANSACTION, HISTORY, DRIFT, CODEBOOK, GLOBAL, LOCAL, ANALYZE, HEIGHT, TIP, BLOCK, CLUSTER, CONNECT, DISCONNECT, STATUS, NODES, LEADER, BLOB, BLOBS, INFO, LINK, TAG, VERIFY, GC, REPAIR

Contextual Keywords

These keywords can be used as identifiers in expression contexts (column names, etc.):

#![allow(unused)]
fn main() {
pub fn is_contextual_keyword(&self) -> bool {
    matches!(self,
        Status | Nodes | Leader | Connect | Disconnect | Cluster |
        Blobs | Info | Link | Unlink | Links | Tag | Untag | Verify | Gc | Repair |
        Height | Transitions | Tip | Block | Codebook | Global | Local | Drift |
        Begin | Commit | Transaction | ...
    )
}
}

Edge Cases and Gotchas

Ambiguous Token Sequences

  1. Minus vs Arrow: - vs -> distinguished by lookahead
  2. Less-than variants: < vs <= vs <> vs <<
  3. Pipe variants: | (bitwise) vs || (concat)
  4. Keyword as identifier: SELECT status FROM orders - status is contextual keyword

Number Parsing Edge Cases

#![allow(unused)]
fn main() {
// "3." is integer 3 followed by dot
tokens("3. ") // [Integer(3), Dot, Eof]

// "3.0" is float
tokens("3.0") // [Float(3.0), Eof]

// "3.x" is integer 3, dot, identifier x
tokens("3.x") // [Integer(3), Dot, Ident("x"), Eof]

// Scientific notation
tokens("1e10")   // [Float(1e10), Eof]
tokens("2.5E-3") // [Float(0.0025), Eof]
}

String Literal Edge Cases

#![allow(unused)]
fn main() {
// SQL-style doubled quotes
tokens("'it''s'") // [String("it's"), Eof]

// Unterminated string
tokens("'unterminated") // [Error("unterminated string literal"), Eof]

// String with newline (error)
tokens("'line1\nline2'") // Error - strings cannot span lines
}

Expression Depth Limit

#![allow(unused)]
fn main() {
// Deeply nested expressions hit the depth limit (64)
let mut expr = "x".to_string();
for _ in 0..70 {
    expr = format!("({})", expr);
}
parse_expr(&expr) // Err(ParseErrorKind::TooDeep)
}

BETWEEN Precedence

The AND in BETWEEN low AND high is part of the BETWEEN syntax, not a logical operator:

#![allow(unused)]
fn main() {
// "x BETWEEN 1 AND 10 AND y = 5" parses as:
// (x BETWEEN 1 AND 10) AND (y = 5)
// Not: x BETWEEN 1 AND (10 AND y = 5)
}

Qualified Wildcard Restriction

#![allow(unused)]
fn main() {
// Valid: table.*
parse_expr("users.*") // Ok(QualifiedWildcard)

// Invalid: (expr).*
parse_expr("(1 + 2).*") // Err("qualified wildcard requires identifier")
}

Performance

OperationComplexityNotes
TokenizeO(n)Single pass, no backtracking
ParseO(n)Single pass, constant stack per token
TotalO(n)Where n = input length

Memory Usage

  • Lexer: O(1) - only stores position and one peeked character
  • Parser: O(1) - only stores current and peeked token
  • AST: O(n) - proportional to number of nodes
  • Span: 8 bytes per span (two u32 values)

Optimizations

  1. Keyword lookup: O(1) via match statement on uppercase string
  2. Token comparison: Uses std::mem::discriminant for enum comparison
  3. Span tracking: Constant-time merge operation
  4. No allocations during parsing: Identifiers and strings owned in tokens
ModuleRelationship
query_routerConsumes AST and executes queries against engines
neumann_shellUses parser for interactive REPL commands
tensor_chainChain statements (BEGIN, COMMIT, HISTORY) parsed here
tensor_vaultVault statements (SET, GET, GRANT) parsed here
tensor_cacheCache statements (INIT, STATS, PUT) parsed here
tensor_blobBlob statements (PUT, GET, TAG) parsed here

Testing

The parser has comprehensive test coverage including:

  • Unit tests in each module: Token, span, lexer, parser, expression tests
  • Integration tests: Complex SQL queries, multi-statement parsing
  • Edge case tests: Unterminated strings, deeply nested expressions, ambiguous operators
  • Fuzz targets: parser_parse, parser_parse_all, parser_parse_expr, parser_tokenize
# Run parser tests
cargo test -p neumann_parser

# Run parser fuzz targets
cargo +nightly fuzz run parser_parse -- -max_total_time=60
cargo +nightly fuzz run parser_tokenize -- -max_total_time=60