Score: 2

Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

Published: December 6, 2025 | arXiv ID: 2512.06443v1

By: Xiangyu Li , Chengyu Yin , Weijun Wang and more

Potential Business Impact:

Makes small computers run smart AI much faster.

Business Areas:

Image Recognition Data and Analytics, Software

Large language models (LLMs) are increasingly deployed on edge devices. To meet strict resource constraints, real-world deployment has pushed LLM quantization from 8-bit to 4-bit, 2-bit, and now 1.58-bit. Combined with lookup table (LUT)-based inference, CPUs run these ultra-low-bit LLMs even faster than NPUs, opening new opportunities for ubiquitous on-device intelligence. However, this paper identifies that LUT-based inference underutilizes memory bandwidth during parallel inference, which is required for prefilling, test-time scaling, and other multi-token scenarios. The root cause is the scalar LUT paradigm, which performs repetitive and non-contiguous memory accesses for each token. To solve the issue, we propose vector LUT, a new lookup paradigm that constructs a unified LUT across parallel tokens, and performs a single $1 \rightarrow N$ lookup per index. To realize it efficiently, we further introduce (1) Vector LUT-Centric Tensor Layout, and (2) Cache-Aware Streamed Lookup techniques. Evaluations on 5 edge devices across 3 LLMs show that Vec-LUT outperforms state-of-the-art baselines by up to $4.2\times$. Our implementation is integrated into llama.cpp. The code is available at https://github.com/Cipherxzc/vlut.cpp.

LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

Hardware Architecture

Makes AI run faster and use less power.

9 Nov 2025 1

89%

LL-ViT: Edge Deployable Vision Transformers with Look Up Table Neurons

Machine Learning (CS)

Makes smart cameras work faster and use less power.

2 Nov 2025 0

88%

ELUTQ: Efficient LUT-Aware Quantization for Deploying Large Language Models on Edge Devices

Machine Learning (CS)

Makes smart AI run on phones, faster and smaller.

22 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

14 pages

Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices

Makes small computers run smart AI much faster.

Technical Abstract

LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

LL-ViT: Edge Deployable Vision Transformers with Look Up Table Neurons

ELUTQ: Efficient LUT-Aware Quantization for Deploying Large Language Models on Edge Devices