HERP: Hardware for Energy Efficient and Realtime DB Search and Cluster Expansion in Proteomics
By: Md Mizanur Rahaman Nayan , Zheyu Li , Flavio Ponzina and more
Potential Business Impact:
Finds patterns in data much faster and with less energy.
Database search and clustering are fundamental components of many data analytics problems, such as mass spectrometry-driven proteomics. Traditional full clustering and search algorithms suffer from high resource usage and long latencies. We introduce HERP, a lightweight incremental clustering method and a highly parallelizable database (DB) search platform that utilizes 3T2MTJ SOT-MRAM based CAM in 7nm technology for in-memory acceleration. A single hardware initialization using pre-clustered proteomics data allows for continuous DB searching and local re-clustering, providing a more practical and efficient alternative to clustering from scratch. Heuristics derived from the initial pre-clustered data guide the incremental process, accelerating clustering by 20x at a cost of 0.3% increase in clustering error where DB search results overlap by 96% with SOTA algorithms validating search quality. For a 131GB human genome proteomics dataset HERP setup requires 1.19mJ for 2M spectra while 1000 query search consumes only 1.1uJ at SOTA accuracy. Bucket-wise parallelization and query scheduling provides additional 100x speedup.
Similar Papers
HERP: Hardware for Energy Efficient and Realtime DB Search and Cluster Expansion in Proteomics
Databases
Speeds up finding proteins in body samples.
A Hybrid Heuristic Framework for Resource-Efficient Querying of Scientific Experiments Data
Databases
Makes computers answer questions from huge data faster.
Terabyte-Scale Analytics in the Blink of an Eye
Databases
Runs big data jobs 60 times faster.