Intelligent Loan Origination

Loan origination has a document problem and a consistency problem. Borrowers submit pay stubs (often photographed), W-2s, bank statements, and tax returns in every imaginable format. Loan officers across branches apply different standards to the same data.

ai_parse_document for loan documents

Databricks ai_parse_document processes scanned, photographed, and handwritten documents natively. No separate OCR pipeline. It handles pay stubs with variable layouts, photographed W-2 forms, bank statements from different institutions, and tax returns with handwritten amendments. It captures tables, figures, and document structure.

The origination pipeline uses three agents:

Document Extraction Agent: ai_parse_document parses all submitted documents, then ai_extract pulls income, employer, account balances, and tax data
Validation Agent: ai_query cross-references extracted data across documents (income on pay stub vs. W-2 vs. tax return) and flags discrepancies
Decision Support Agent: Assesses creditworthiness using Feature Store credit models, recommends terms

Agent Bricks Multi-Agent Supervisor coordinates the pipeline, handling re-extraction on low-confidence results and routing edge cases to loan officers.

The Lakeflow data engineering stack

Lakeflow Connect syncs data from the Loan Origination System and credit bureaus via managed connectors
Spark Declarative Pipelines handle document processing ETL (streaming tables for incoming applications, materialized views for enriched borrower profiles)
Lakeflow Jobs orchestrate the end-to-end origination workflow

Consistency through Feature Store and Unity Catalog

Feature Store serves the same credit scoring features to every branch: debt-to-income ratios, payment history patterns, collateral valuations. Point-in-time correctness ensures fair lending compliance (ECOA, HMDA).

Unity Catalog governs everything: data lineage for regulatory audit, function registry for validation rules, model governance, and serving endpoints. All models accessed via AI Gateway on Databricks Model Serving.

Results

Time-to-close drops by 40%. Manual document review decreases by 30%. Underwriting standards become consistent across all branches because the same models and validation rules apply everywhere.

ai_parse_document for loan documents

The origination pipeline uses three agents:

Document Extraction Agent: ai_parse_document parses all submitted documents, then ai_extract pulls income, employer, account balances, and tax data

Validation Agent: ai_query cross-references extracted data across documents (income on pay stub vs. W-2 vs. tax return) and flags discrepancies

Decision Support Agent: Assesses creditworthiness using Feature Store credit models, recommends terms

Agent Bricks Multi-Agent Supervisor coordinates the pipeline, handling re-extraction on low-confidence results and routing edge cases to loan officers.

The Lakeflow data engineering stack

Lakeflow Connect syncs data from the Loan Origination System and credit bureaus via managed connectors

Spark Declarative Pipelines handle document processing ETL (streaming tables for incoming applications, materialized views for enriched borrower profiles)

Lakeflow Jobs orchestrate the end-to-end origination workflow

Consistency through Feature Store and Unity Catalog

Intelligent Loan Origination: From Document Chaos to Consistent Decisions

Intelligent Loan Origination

ai_parse_document for loan documents

The Lakeflow data engineering stack

Consistency through Feature Store and Unity Catalog

Results

Intelligent Loan Origination: From Document Chaos to Consistent Decisions

Intelligent Loan Origination

ai_parse_document for loan documents

The Lakeflow data engineering stack

Consistency through Feature Store and Unity Catalog

Results