Score: 1

LOCUS: A System and Method for Low-Cost Customization for Universal Specialization

Published: December 6, 2025 | arXiv ID: 2512.06239v1

By: Dhanasekar Sundararaman , Keying Li , Wayne Xiong and more

BigTech Affiliations: Microsoft

Potential Business Impact:

Makes AI understand text with less data.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

We present LOCUS (LOw-cost Customization for Universal Specialization), a pipeline that consumes few-shot data to streamline the construction and training of NLP models through targeted retrieval, synthetic data generation, and parameter-efficient tuning. With only a small number of labeled examples, LOCUS discovers pertinent data in a broad repository, synthesizes additional training samples via in-context data generation, and fine-tunes models using either full or low-rank (LoRA) parameter adaptation. Our approach targets named entity recognition (NER) and text classification (TC) benchmarks, consistently outperforming strong baselines (including GPT-4o) while substantially lowering costs and model sizes. Our resultant memory-optimized models retain 99% of fully fine-tuned accuracy while using barely 5% of the memory footprint, also beating GPT-4o on several benchmarks with less than 1% of its parameters.

A large-scale, unsupervised pipeline for automatic corpus annotation using LLMs: variation and change in the English consider construction

Computation and Language

Lets computers sort words for language study.

14 Oct 2025 0

85%

Low-Resource Fine-Tuning for Multi-Task Structured Information Extraction with a Billion-Parameter Instruction-Tuned Model

Computation and Language

Small AI learns to find info cheaply.

10 Sep 2025 0

84%

The PLLuM Instruction Corpus

Computation and Language

Teaches computers to understand and write Polish.

21 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

8 pages

LOCUS: A System and Method for Low-Cost Customization for Universal Specialization

Makes AI understand text with less data.

Technical Abstract

A large-scale, unsupervised pipeline for automatic corpus annotation using LLMs: variation and change in the English consider construction

Low-Resource Fine-Tuning for Multi-Task Structured Information Extraction with a Billion-Parameter Instruction-Tuned Model

The PLLuM Instruction Corpus