FinRAG
Multimodal Financial Document Intelligence Platform
<2s
Query Latency
end-to-end
5+
Doc Types Supported
multimodal
91.4%
Retrieval Precision
top-3 accuracy
01The Challenge
FinRAG was born from a question DataSalt kept running into across financial NLP engagements: why do RAG systems struggle with financial documents?
The answer is structure. Financial filings — 10-Ks, 10-Qs, earnings call transcripts, proxy statements — are among the most information-dense documents in existence. They contain dense prose, nested tables, footnotes that override headline numbers, embedded charts, and cross-references that span hundreds of pages. A standard text-chunking RAG pipeline treats all of this as undifferentiated text and performs poorly on exactly the queries that matter most: "What was the effective tax rate excluding one-time items?", "How did segment margins change year-over-year?", "What did management say about guidance on the Q3 call?"
The second problem is multimodality. Critical data in financial documents lives in tables and charts — not prose. Standard embedding pipelines that chunk raw text miss or corrupt tabular data entirely. A system that can't read a table can't answer a financial question reliably.
We built FinRAG to close that gap: a RAG architecture purpose-built for financial documents, with table-aware parsing, structure-preserving chunking, and a hybrid retrieval strategy that handles both semantic and structured queries. The system is live at finrag.io and serves as a companion to the market-sentiment.io dashboard — together forming DataSalt's financial AI demonstration suite.
02Our Approach
We built a six-stage pipeline: multimodal document ingestion, structure-aware embedding via Google Gemini, hybrid dense + sparse indexing in Qdrant, reciprocal rank fusion with cross-encoder reranking, LLM synthesis with page-level citations, and streamed responses with TTS audio for earnings calls.
- Google Gemini Embeddings 2 — native multimodal embedding — text, tables, and chart images in a shared 3072-dim vector space with no intermediate captioning step
- Google Gemini Flash 2.5 (TTS) — synthesized audio playback of earnings call summaries and key passages — enabling on-the-go consumption of financial intelligence
- Qdrant — vector database with payload filtering by chunk type (text, table, image) and named vector support for hybrid dense + sparse retrieval
- Cloudflare R2 — object storage for the raw PDF corpus and extracted assets (serialized tables, chart images)
- Fly.io — Python API server (FastAPI) hosting the retrieval pipeline, cross-encoder reranker, and streaming generation endpoint
- Next.js + Vercel — frontend with Server-Sent Events for streamed responses, audio playback for TTS, and full-text citation rendering
Document Ingestion
PDF parse + chunk by type
Multimodal Embedding
Gemini Embeddings 2
Hybrid Indexing
Qdrant dense + BM25 sparse
Retrieval + Reranking
RRF fusion → cross-encoder
LLM Synthesis
Cited, grounded answers
Streaming Response
SSE + TTS audio
03Key Findings
Retrieval Precision by Document Type
Top-3 chunk retrieval precision across five document types. Table chunks slightly underperform prose chunks due to serialization edge cases in heavily merged-cell tables — a known area for improvement.
Latency by Query Type
Retrieval and generation time by query complexity. Multi-hop and comparative queries require more retrieved chunks and longer generation context, driving higher latency — both remain within acceptable interactive UX bounds.
Hybrid vs. Dense-Only Retrieval Recall
Recall across a 100-query evaluation set. BM25 provides the largest gains on queries containing exact financial terminology (EBITDA, segment names, specific fiscal quarters) where dense embeddings under-represent surface-form specificity.
Interactive Demo
Try FinRAG Live
Query a curated corpus of SEC filings and earnings transcripts — with table-aware retrieval and cited answers.
Launch FinRAG04Business Impact
Projected Annual Value
Purpose-built financial RAG with multimodal retrieval, page-level citations, and earnings call TTS
FinRAG demonstrates what a purpose-built financial RAG system looks like when document structure is treated as a first-class concern — not an afterthought. The hybrid retrieval strategy outperforms dense-only retrieval by 4–8 percentage points on financial-specific queries. Table serialization enables a class of queries that standard chunking pipelines cannot answer. And citation-first generation makes every response auditable — a non-negotiable requirement in financial workflows where hallucination poses real risk.
As a DataSalt portfolio project, FinRAG is designed to be deployed against any corpus of financial documents. It currently indexes a demonstration set of public SEC filings (10-K and 10-Q) for three S&P 500 companies, with earnings call transcripts for the same period. FinRAG pairs with market-sentiment.io — DataSalt's live sentiment regime detection dashboard — to form a complete financial AI demonstration suite.
05Technical Details
Document Parsing & Ingestion
- PDF extraction: pdfplumber for structured layout detection; PyMuPDF as fallback
- Table parsing: pdfplumber table extraction → markdown serialization with column headers preserved
- Chart/figure handling: extracted as images and passed directly to the Gemini embedding model — no intermediate captioning step
- Document storage: raw PDFs and extracted assets stored in Cloudflare R2
Embeddings — The Multimodal Core
- Model: Google Gemini Embeddings 2 preview (gemini-embedding-exp-03-07)
- Modalities: text prose, serialized table markdown, and chart/figure images in a unified vector space
- Dimension: 3072 — single model handles all content types without modality-specific pipelines
Retrieval Architecture
- Vector store: Qdrant with payload filtering by chunk type (text / table / image)
- Sparse: BM25 via rank-bm25, indexed on tokenized financial corpus
- Fusion: Reciprocal Rank Fusion (k=60) across dense and sparse ranked lists
- Reranker: cross-encoder/ms-marco-MiniLM-L-6-v2 for top-20 → top-5 selection
LLM & Generation
- Model: Claude Sonnet 4.5 via Anthropic API
- Context: top-5 reranked chunks (~3,000 tokens) + system prompt
- Output: answer + cited sources (document name, chunk ID, page number)
- Streaming: Anthropic streaming API → Next.js Server-Sent Events
Text-to-Speech (Earnings Calls)
- Model: Google Gemini Flash 2.5 TTS
- Synthesized audio playback of RAG-generated summaries and key earnings call passages
- Generated server-side via Gemini API, streamed to client as audio
Infrastructure
- Backend: FastAPI (Python), deployed on Fly.io
- Frontend: Next.js 14 App Router, deployed on Vercel at finrag.io
- Vector DB: Qdrant
- Object storage: Cloudflare R2 (PDF corpus and extracted assets)
- Ingestion: offline batch pipeline (Python), triggered on corpus updates
Facing similar challenges?
Let's discuss how data science can drive results for your business.

