Score: 1

The Data Efficiency Frontier of Financial Foundation Models: Scaling Laws from Continued Pretraining

Published: December 13, 2025 | arXiv ID: 2512.12384v1

By: Jesse Ponnock

BigTech Affiliations: Johns Hopkins University

Potential Business Impact:

Teaches computers to understand money talk better.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Domain-adaptive pretraining (DAPT) offers a practical path to specializing large language models for high-value domains without full retraining. We conduct an early-stage scaling-law analysis of continued pretraining on U.S. SEC filings, training 1B and 3B-parameter Llama-3.2 models on a 400M-token financial corpus with validation checkpoints at 50M, 100M, 200M, and 400M tokens. Results show consistent improvements in SEC-domain validation loss for both models, with the largest gains occurring within the first 200M tokens and diminishing returns thereafter. Power-law fits reveal shallow exponents, indicating that financial language is highly regular and efficiently learnable under continued pretraining. General-domain validation loss remains effectively unchanged across all token budgets, suggesting minimal drift and no signs of catastrophic forgetting. A data-efficiency frontier further shows that both models move toward improved specialization with negligible mixed-domain degradation. Together, these findings provide early empirical guidance for scaling financial foundation models, suggesting that meaningful domain adaptation can be achieved with comparatively modest token budgets and that larger model scales (7B-70B) remain tractable under projected data requirements.

Country of Origin
🇺🇸 United States

Page Count
8 pages

Category
Computer Science:
Machine Learning (CS)