Score: 0

A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents

Published: October 31, 2025 | arXiv ID: 2510.27107v1

By: Zhipeng Liao , Kunming Shao , Jiangnan Yu and more

Potential Business Impact:

Makes AI doctors work faster and use less power.

Business Areas:

Augmented Reality Hardware, Software

With powerful and integrative large language models (LLMs), medical AI agents have demonstrated unique advantages in providing personalized medical consultations, continuous health monitoring, and precise treatment plans. Retrieval-Augmented Generation (RAG) integrates personal medical documents into LLMs by an external retrievable database to address the costly retraining or fine-tuning issues in deploying customized agents. While deploying medical agents in edge devices ensures privacy protection, RAG implementations impose substantial memory access and energy consumption during the retrieval stage. This paper presents a hierarchical retrieval architecture for edge RAG, leveraging a two-stage retrieval scheme that combines approximate retrieval for candidate set generation, followed by high-precision retrieval on pre-selected document embeddings. The proposed architecture significantly reduces energy consumption and external memory access while maintaining retrieval accuracy. Simulation results show that, under TSMC 28nm technology, the proposed hierarchical retrieval architecture has reduced the overall memory access by nearly 50% and the computation by 75% compared to pure INT8 retrieval, and the total energy consumption for 1 MB data retrieval is 177.76 {\mu}J/query.

Grounding Large Language Models in Clinical Evidence: A Retrieval-Augmented Generation System for Querying UK NICE Clinical Guidelines

Computation and Language

Helps doctors find medical advice fast.

3 Oct 2025 0

91%

Grounded by Experience: Generative Healthcare Prediction Augmented with Hierarchical Agentic Retrieval

Artificial Intelligence

Helps doctors predict patient health better.

17 Nov 2025 2

91%

Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers

Information Retrieval

Helps computers answer questions with real-world facts.

28 May 2025 1

View PDF Login to Bookmark

Page Count

5 pages

A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents

Makes AI doctors work faster and use less power.

Technical Abstract

Grounding Large Language Models in Clinical Evidence: A Retrieval-Augmented Generation System for Querying UK NICE Clinical Guidelines

Grounded by Experience: Generative Healthcare Prediction Augmented with Hierarchical Agentic Retrieval

Retrieval-Augmented Generation: A Comprehensive Survey of Architectures, Enhancements, and Robustness Frontiers