The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference
By: Fang Li
Potential Business Impact:
Makes phones run smart AI without slow internet.
The deployment of Large Language Models (LLMs) on consumer edge devices is throttled by the "Memory Wall" -- the prohibitive bandwidth and energy cost of fetching gigabytes of model weights from DRAM for every token generated. Current architectures (GPUs, NPUs) treat model weights as mutable software data, incurring massive energy penalties to maintain general-purpose programmability. We propose The Immutable Tensor Architecture (ITA), a paradigm shift that treats model weights not as data, but as physical circuit topology. By encoding parameters directly into the metal interconnects and logic of mature-node ASICs (28nm/40nm), ITA eliminates the memory hierarchy entirely. We present a "Split-Brain" system design where a host CPU manages dynamic KV-cache operations while the ITA ASIC acts as a stateless, ROM-embedded dataflow engine.
Similar Papers
A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations
Hardware Architecture
Makes AI learn faster and use less power.
Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure
Distributed, Parallel, and Cluster Computing
Builds super-fast AI by connecting computer parts better.
Hardware-Aware Data and Instruction Mapping for AI Tasks: Balancing Parallelism, I/O and Memory Tradeoffs
Hardware Architecture
Makes AI run faster using less power.