Parameter-Efficient Transformer Embeddings
By: Henry Ndubuaku, Mouad Talhi
Potential Business Impact:
Makes AI understand words using less computer memory.
Embedding layers in transformer-based NLP models typically account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. We propose an alternative approach in which token embedding vectors are first generated deterministically, directly from the token IDs using a Fourier expansion of their normalized values, followed by a lightweight multilayer perceptron (MLP) that captures higher-order interactions. We train standard transformers and our architecture on natural language inference tasks (SNLI and MNLI), and evaluate zero-shot performance on sentence textual similarity (STS-B). Our results demonstrate that the proposed method achieves competitive performance using significantly fewer parameters, trains faster, and operates effectively without the need for dropout. This proof-of-concept study highlights the potential for scalable, memory-efficient language models and motivates further large-scale experimentation based on our findings.
Similar Papers
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices
Computation and Language
Makes small AI models run faster, use less power.
Static Word Embeddings for Sentence Semantic Representation
Computation and Language
Makes computers understand sentences better.
ZeroLM: Data-Free Transformer Architecture Search for Language Models
Computation and Language
Finds best computer brains faster and cheaper.