ArcBERT: An LLM-based Search Engine for Exploring Integrated Multi-Omics Metadata
By: Gajendra Doniparthi , Shashank Balu Pandhare , Stefan Deßloch and more
Potential Business Impact:
Find research data with normal sentences.
Traditional search applications within Research Data Management (RDM) ecosystems are crucial in helping users discover and explore the structured metadata from the research datasets. Typically, text search engines require users to submit keyword-based queries rather than using natural language. However, using Large Language Models (LLMs) trained on domain-specific content for specialized natural language processing (NLP) tasks is becoming increasingly common. We present ArcBERT, an LLM-based system designed for integrated metadata exploration. ArcBERT understands natural language queries and relies on semantic matching, unlike traditional search applications. Notably, ArcBERT also understands the structure and hierarchies within the metadata, enabling it to handle diverse user querying patterns effectively.
Similar Papers
Flexible metadata harvesting for ecology using large language models
Digital Libraries
Finds and links science data for new discoveries.
AR-Med: Automated Relevance Enhancement in Medical Search via LLM-Driven Information Augmentation
Computation and Language
Finds the right health answers online, safely.
Metadata Extraction Leveraging Large Language Models
Machine Learning (Stat)
Helps lawyers find important contract parts faster.