
Most of us use databases every day.
But at some point I started wondering:
- How does SQL actually work internally?
- How are queries parsed?
- How do joins work?
- What happens after a
SELECTstatement? - How does persistence work under the hood?
So instead of only reading about databases, I decided to build one.
That project became Ark — a SQL-like relational database engine written entirely from scratch in C++.
Why I Built It
I wanted to understand the internals of database systems by implementing the pieces myself instead of relying on existing engines or parser generators.
The goal wasn’t to compete with production databases.
The goal was to learn:
- parsing
- query execution
- relational operations
- schema management
- persistence systems
- software architecture
Core Features
Ark currently supports:
- Handwritten tokenizer
- Recursive descent parser
- CRUD operations
- INNER / LEFT / RIGHT / FULL joins
- Aggregate functions (
COUNT,SUM,AVG,MIN,MAX) ALTER TABLE-
LIKEpattern matching ORDER BYDISTINCT- File persistence (
SAVE/LOAD) - Three-tier diagnostics system with exact line/column reporting
Everything is implemented manually:
- no external database libraries
- no parser generators
- no embedded SQL engines
Architecture
The execution pipeline looks roughly like this:
Query
↓
Tokenizer
↓
Parser
↓
Command Objects
↓
Execution Engine
↓
Storage Layer
↓
Persistence
The project is split into modular components:
- tokenizer
- parser
- execution engine
- diagnostics
- storage/persistence
Example Query
CREATE TABLE employees (
id INT,
name STRING,
salary DOUBLE
);
INSERT INTO employees VALUES
(1, "Alice", 95000.0),
(2, "Bob", 72000.0);
SELECT * FROM employees
WHERE salary > 80000.0;
One of the Hardest Parts
One of the most interesting challenges was implementing joins and schema evolution.
Handling:
ALTER TABLE- adding/dropping columns
- persistence consistency
- join execution
became much more complicated than I initially expected.
Parser correctness and diagnostics also took a surprising amount of effort.
What I Learned
Building Ark taught me a lot about:
- how parsers actually work
- query execution pipelines
- relational database concepts
- software architecture
- debugging complex state systems
- designing diagnostics/error reporting
It also gave me a much deeper appreciation for real database engines.
GitHub
GitHub Repository:
https://github.com/kashyap-devansh/Ark
I’d genuinely appreciate feedback from people interested in:
- databases
- systems programming
- parsers
- compilers
- C++
Especially suggestions for improving the architecture or query engine.


























