Banks and insurers are putting large language models behind document Q&A, underwriting support, and analyst workflows. Two main levers are retrieval-augmented generation (RAG) and fine-tuning. This post summarizes how current APIs and platforms support these patterns.
In RAG, you run a query through an embedding model, retrieve relevant chunks from a vector store or search index, and pass those chunks plus the question to an LLM. The model answers using the retrieved context, which reduces hallucination and keeps answers grounded in your data.
Where it is used in finance: Policy and contract Q&A, claims document summarization, internal knowledge bases, and compliance lookup.
On Databricks, Vector Search provides managed vector indexing and retrieval, integrated with Unity Catalog for governance. You embed documents using AI Functions or Model Serving endpoints, store them in Delta Lake, and query via vector_search() in SQL. All models — embeddings and chat — are accessible via AI Gateway on Databricks Model Serving, so there is no need for external API configuration.
Caveat: Published work on insurance Q&A with RAG still reports non-zero rates of incorrect or unsupported statements. So for customer-facing or high-stakes answers, you pair RAG with human review, citations, and clear disclaimers.
Fine-tuning is useful when you need: (1) consistent output structure (e.g. JSON for downstream systems), (2) adoption of internal terminology, or (3) better behavior on a narrow task than base models. Pricing is per token for training and for inference.
For banking and insurance, fine-tuning is often applied to: extraction from forms or emails, classification (e.g. intent or product), and short structured summaries. It does not replace RAG for "answer from this 100-page doc," but it can improve accuracy and consistency on repeated task types.
On Databricks, you can fine-tune models and deploy them via Model Serving with provisioned throughput, then access them through AI Gateway alongside pre-deployed models like databricks-claude-sonnet-4 and databricks-meta-llama-3-3-70b-instruct.
A common production pattern is: retrieve with RAG (and optionally keyword or hybrid search), generate with a base or fine-tuned model, then validate with rules or a second model. You might fine-tune for a specific output schema and use RAG to supply the context.
For document-heavy use cases, Databricks ai_parse_document extracts structured content from unstructured documents (PDFs, images, DOCX) natively in SQL or PySpark. No separate OCR pipeline needed. Combined with AI Functions (ai_extract, ai_classify, ai_summarize), you can build complete document intelligence pipelines directly on the Lakehouse.
In regulated industries, access control and audit trails are as important as model choice. Unity Catalog enforces table- and column-level access, PII masking, and logs usage. It governs not just data but models, functions, pipelines, vector search indexes, and serving endpoints — the foundational governance layer for all AI workloads.