The Data Efficiency Frontier of Financial Foundation Models: Scaling Laws from Continued Pretraining
By: Jesse Ponnock
Potential Business Impact:
Teaches computers to understand money talk better.
Domain-adaptive pretraining (DAPT) offers a practical path to specializing large language models for high-value domains without full retraining. We conduct an early-stage scaling-law analysis of continued pretraining on U.S. SEC filings, training 1B and 3B-parameter Llama-3.2 models on a 400M-token financial corpus with validation checkpoints at 50M, 100M, 200M, and 400M tokens. Results show consistent improvements in SEC-domain validation loss for both models, with the largest gains occurring within the first 200M tokens and diminishing returns thereafter. Power-law fits reveal shallow exponents, indicating that financial language is highly regular and efficiently learnable under continued pretraining. General-domain validation loss remains effectively unchanged across all token budgets, suggesting minimal drift and no signs of catastrophic forgetting. A data-efficiency frontier further shows that both models move toward improved specialization with negligible mixed-domain degradation. Together, these findings provide early empirical guidance for scaling financial foundation models, suggesting that meaningful domain adaptation can be achieved with comparatively modest token budgets and that larger model scales (7B-70B) remain tractable under projected data requirements.
Similar Papers
The interplay between domain specialization and model size
Computation and Language
Makes AI smarter in specific jobs with less training.
Domain-Adaptive Continued Pre-Training of Small Language Models
Computation and Language
Makes small AI smarter with less computer power.
Unlocking Multi-Task Electric Energy System Intelligence: Data Scaling Laws and Performance with Limited Fine-Tuning
Systems and Control
Makes power grids smarter for new problems.