Score: 0

ArcBERT: An LLM-based Search Engine for Exploring Integrated Multi-Omics Metadata

Published: December 17, 2025 | arXiv ID: 2512.15365v1

By: Gajendra Doniparthi , Shashank Balu Pandhare , Stefan Deßloch and more

Potential Business Impact:

Find research data with normal sentences.

Business Areas:
Semantic Search Internet Services

Traditional search applications within Research Data Management (RDM) ecosystems are crucial in helping users discover and explore the structured metadata from the research datasets. Typically, text search engines require users to submit keyword-based queries rather than using natural language. However, using Large Language Models (LLMs) trained on domain-specific content for specialized natural language processing (NLP) tasks is becoming increasingly common. We present ArcBERT, an LLM-based system designed for integrated metadata exploration. ArcBERT understands natural language queries and relies on semantic matching, unlike traditional search applications. Notably, ArcBERT also understands the structure and hierarchies within the metadata, enabling it to handle diverse user querying patterns effectively.

Page Count
7 pages

Category
Computer Science:
Databases