Score: 0

The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference

Published: November 28, 2025 | arXiv ID: 2511.22889v1

By: Fang Li

Potential Business Impact:

Makes phones run smart AI without slow internet.

Business Areas:

Intelligent Systems Artificial Intelligence, Data and Analytics, Science and Engineering

The deployment of Large Language Models (LLMs) on consumer edge devices is throttled by the "Memory Wall" -- the prohibitive bandwidth and energy cost of fetching gigabytes of model weights from DRAM for every token generated. Current architectures (GPUs, NPUs) treat model weights as mutable software data, incurring massive energy penalties to maintain general-purpose programmability. We propose The Immutable Tensor Architecture (ITA), a paradigm shift that treats model weights not as data, but as physical circuit topology. By encoding parameters directly into the metal interconnects and logic of mature-node ASICs (28nm/40nm), ITA eliminates the memory hierarchy entirely. We present a "Split-Brain" system design where a host CPU manages dynamic KV-cache operations while the ITA ASIC acts as a stateless, ROM-embedded dataflow engine.

A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations

Hardware Architecture

Makes AI learn faster and use less power.

9 Oct 2025 0

87%

Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure

Distributed, Parallel, and Cluster Computing

Builds super-fast AI by connecting computer parts better.

9 Jul 2025 0

87%

Hardware-Aware Data and Instruction Mapping for AI Tasks: Balancing Parallelism, I/O and Memory Tradeoffs

Hardware Architecture

Makes AI run faster using less power.

4 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

10 pages

The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference

Makes phones run smart AI without slow internet.

Technical Abstract

A Scalable FPGA Architecture With Adaptive Memory Utilization for GEMM-Based Operations

Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure

Hardware-Aware Data and Instruction Mapping for AI Tasks: Balancing Parallelism, I/O and Memory Tradeoffs