Beyond Semantic Search: Engineering a Hybrid RAG Architecture

As an extension of my SCTP Capstone Project, I focused on building a Retrieval-Augmented Generation (RAG) system to move beyond static dashboards and enable dynamic reasoning over real-world e-commerce data. The objective was to transition from theoretical understanding to a production-grade implementation that could handle the messy reality of raw business data.

I initially deployed a naive RAG architecture, flattening structured transaction data with review comments, products, categories, prices, timestamps, and order status into text chunks for vector embedding. While this approach succeeded at semantic retrieval for qualitative questions like "Why are customers complaining about delivery?", it failed significantly at quantitative tasks. When asked statistical questions, such as "What is the average freight cost for the Electronics category?", the system would retrieve a handful of specific order contexts and attempt to hallucinate an aggregate based on incomplete data.

This limitation highlighted a critical engineering constraint: vector search is optimized for semantic similarity, not aggregation.

To address this, I pivoted to a Hybrid RAG architecture. I introduced an intent classification layer that routes incoming queries into two distinct pipelines:
- Statistical Route: For queries requiring aggregation or math, the system acts as a Text-to-SQL engine, generating and executing precise SQL queries against BigQuery to return deterministic results.
- Semantic Route: For qualitative or open-ended exploration, the system utilizes vector search to retrieve relevant unstructured context.

This separation of concerns ensures that the system uses the right tool for the specific cognitive task, relying on the database for hard numbers and the LLM for contextual synthesis.

The process underscored that building trustworthy AI systems is less about prompt engineering and more about architectural design. Effective RAG requires bridging the gap between probabilistic generation and deterministic analytics.

Advanced Visibility: To make this production-ready, I built a Streamlit interface that includes:
- Cost Transparency: Real-time calculation of Token usage and estimated cost per query.

- Semantic Galaxy: A 3D visualization using PCA and K-Means clustering to map the "Voice of Customer," allowing us to visually identify clusters of complaints like "delivery delays" or "damaged packaging".