Score: 1

HERP: Hardware for Energy Efficient and Realtime DB Search and Cluster Expansion in Proteomics

Published: November 5, 2025 | arXiv ID: 2511.03437v1

By: Md Mizanur Rahaman Nayan , Zheyu Li , Flavio Ponzina and more

Potential Business Impact:

Speeds up finding proteins in body samples.

Business Areas:
Big Data Data and Analytics

Database (DB) search and clustering are fundamental in proteomics but conventional full clustering and search approaches demand high resources and incur long latency. We propose a lightweight incremental clustering and highly parallelizable DB search platform tailored for resource-constrained environments, delivering low energy and latency without compromising performance. By leveraging mass-spectrometry insights, we employ bucket-wise parallelization and query scheduling to reduce latency. A one-time hardware initialization with pre-clustered proteomics data enables continuous DB search and local re-clustering, offering a more practical and efficient alternative to clustering from scratch. Heuristics from pre-clustered data guide incremental clustering, accelerating the process by 20x with only a 0.3% increase in clustering error. DB search results overlap by 96% with state-of-the-art tools, validating search quality. The hardware leverages a 3T 2M T J SOT-CAM at the 7nm node with a compute-in-memory design. For the human genome draft dataset (131GB), setup requires 1.19mJ for 2M spectra, while a 1000 query search consumes 1.1{\mu}J. Bucket-wise parallelization further achieves 100x speedup.

Country of Origin
🇺🇸 United States

Page Count
7 pages

Category
Computer Science:
Databases