Score: 0

The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference

Published: November 28, 2025 | arXiv ID: 2511.22889v1

By: Fang Li

Potential Business Impact:

Makes phones run smart AI without slow internet.

Business Areas:
Intelligent Systems Artificial Intelligence, Data and Analytics, Science and Engineering

The deployment of Large Language Models (LLMs) on consumer edge devices is throttled by the "Memory Wall" -- the prohibitive bandwidth and energy cost of fetching gigabytes of model weights from DRAM for every token generated. Current architectures (GPUs, NPUs) treat model weights as mutable software data, incurring massive energy penalties to maintain general-purpose programmability. We propose The Immutable Tensor Architecture (ITA), a paradigm shift that treats model weights not as data, but as physical circuit topology. By encoding parameters directly into the metal interconnects and logic of mature-node ASICs (28nm/40nm), ITA eliminates the memory hierarchy entirely. We present a "Split-Brain" system design where a host CPU manages dynamic KV-cache operations while the ITA ASIC acts as a stateless, ROM-embedded dataflow engine.

Country of Origin
🇺🇸 United States

Page Count
10 pages

Category
Computer Science:
Hardware Architecture