Score: 1

Luxical: High-Speed Lexical-Dense Text Embeddings

Published: December 9, 2025 | arXiv ID: 2512.09015v1

By: DatologyAI , : , Luke Merrick and more

Potential Business Impact:

Makes computers understand text much faster.

Business Areas:

Text Analytics Data and Analytics, Software

Frontier language model quality increasingly hinges on our ability to organize web-scale text corpora for training. Today's dominant tools trade off speed and flexibility: lexical classifiers (e.g., FastText) are fast but limited to producing classification output scores, while the vector-valued outputs of transformer text embedding models flexibly support numerous workflows (e.g., clustering, classification, and retrieval) but are computationally expensive to produce. We introduce Luxical, a library for high-speed "lexical-dense" text embeddings that aims to recover the best properties of both approaches for web-scale text organization. Luxical combines sparse TF--IDF features, a small ReLU network, and a knowledge distillation training regimen to approximate large transformer embedding models at a fraction of their operational cost. In this technical report, we describe the Luxical architecture and training objective and evaluate a concrete Luxical model in two disparate applications: a targeted webcrawl document retrieval test and an end-to-end language model data curation task grounded in text classification. In these tasks we demonstrate speedups ranging from 3x to 100x over varying-sized neural baselines, and comparable to FastText model inference during the data curation task. On these evaluations, the tested Luxical model illustrates favorable compute/quality trade-offs for large-scale text organization, matching the quality of neural baselines. Luxical is available as open-source software at https://github.com/datologyai/luxical.

Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data

Computation and Language

Makes computers understand feelings in text better.

2 Jun 2025 1

86%

Advancing Text Classification with Large Language Models and Neural Attention Mechanisms

Computation and Language

Helps computers understand and sort text better.

10 Dec 2025 0

85%

Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models

Computation and Language

Models store word meanings early, grammar later.

2 Jun 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com github.com

Page Count

11 pages

Luxical: High-Speed Lexical-Dense Text Embeddings

Makes computers understand text much faster.

Technical Abstract

Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data

Advancing Text Classification with Large Language Models and Neural Attention Mechanisms

Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models