Building a Production RAG System: Fitmeal Planner's Nutrition Engine

How I built a production RAG system that lets users modify a 7-day meal plan in plain English while enforcing hard allergy and calorie constraints — architecture, trade-offs, and what I'd do again.

Most "RAG demos" retrieve a chunk and paste it into a prompt. That falls apart the moment a user says "swap the salmon, I'm allergic to fish, and keep it under 1,800 calories." Now retrieval has to respect hard constraints, not just semantic similarity. Fitmeal Planner is the production system I built to handle exactly that — a dynamic 7-day meal engine where users edit plans in natural language while the system holds allergy and caloric limits as non-negotiable invariants.

This is the difference between a prototype and something people can actually trust with a dietary restriction.

The core problem: free-text intent, hard constraints

A user request like "make day three vegetarian and higher protein" carries three separate things:

Semantic intent — vegetarian, more protein
Hard constraints — existing allergies, the day's calorie ceiling
Structure — it must remain a valid, balanced day of meals

Pure vector similarity gets you (1) and ignores (2) and (3). The architecture has to layer deterministic rules on top of retrieval, not hope the LLM remembers them.

Architecture

I built it on Python, FastAPI, LangChain, and PostgreSQL with a RAG-based tagging system at its core:

Tagging layer — every meal is embedded and tagged (macros, allergens, dietary class). This is the retrieval corpus. Tags are structured metadata, not free text, so they're filterable before the vector search runs.
Constraint pre-filter — allergy and calorie limits are applied as a hard metadata filter on the candidate set. A meal a user is allergic to is never a retrieval candidate in the first place. Safety can't be a post-hoc prompt instruction; it's a filter on the search space.
Semantic re-rank — only the constraint-valid candidates get ranked by similarity to the user's intent.
LLM modification step — the model proposes the edit within the validated candidate set, then the result is re-checked against the day's calorie budget before it's persisted.

The key design principle: the LLM operates inside a sandbox the deterministic layer defines. It never has the authority to violate an allergy constraint, because it never sees a forbidden option.

Trade-offs I made

Metadata filtering before vector search, not after. Filtering after retrieval means you can retrieve 10 candidates and discard 9 for allergen violations, leaving you with thin, low-quality results. Filtering first keeps the candidate pool both safe and relevant.
Structured tags over pure embeddings for the constraint dimensions. Embeddings are fuzzy by design — exactly wrong for "contains shellfish: yes/no." Hybrid retrieval (structured filter + semantic rank) is the production answer.
Validation after generation. LLMs drift. The post-generation calorie re-check is cheap insurance against a confident-but-wrong plan.

What this demonstrates

Production RAG isn't a retrieval call — it's a pipeline where retrieval is one stage between deterministic guardrails. The interesting engineering is in where you put the constraints, not in the prompt.

I'm Haris Ahmed, an AI engineer and full-stack software engineer building production RAG and LLM systems. See more of my work at harisahmed.dev.

Back to all writing