Hey — sharing a project I've been building for the last
few months. It's a movie recommendation system that runs entirely on
your laptop using Ollama, with a Corrective-RAG pipeline.
Why I built it: existing streaming platforms only know what you
watched on them. Netflix can't see my Prime history, none of them know
about cinema watches. Wanted one system that learns from all of it.
Stack:
- 7-stage Corrective-RAG (LangGraph static graph, not autonomous agents)
- Hybrid retrieval: Chroma dense vectors + rank-bm25 sparse, fused via RRF
- BGE-small-en-v1.5 embeddings + BGE-reranker-base cross-encoder
- Grader-based correction loop with retry budget
- Cited explanations - every bullet must reference a real source field, bullets that fail validation are dropped (no hallucinated plot summaries)
- Ollama llama3 default, OpenAI/Anthropic pluggable per role
The interesting design choice was query expansion at INGEST time instead
of query time. The enrichment LLM generates 3-5 pseudo-queries per movie
and embeds them alongside the plot. Catalogues are bounded; user queries
aren't, so paying the LLM cost once per movie scales better than once
per query.
Latency on M3 / 36GB / Ollama llama3: ~90s/query (filter_extract +
explain dominate). llama3.2:1b drops to ~15-20s. Hosted models ~5-10s.
Code + setup: github.com/meetgrewal7793-creator/personal-movie-recommender
The 7-stage architecture diagram is in the README. Feedback welcome —
especially on the grader prompt calibration, which I had to relax for
local-LLM defaults because llama3 graders over-flag results as weak.





















