Task & Motivation
The objective is to enable LLMs to comprehend and extract expert knowledge from complex, distributed genomic databases to answer natural language queries.
The work motivated by several limitations inherent in standard LLMs and the current state-of-the-art system, GeneGPT:
1. Complexity of Genomic Data: Extracting data from distributed biomedical databases remains a significant challenge for researchers. Standard LLMs struggle with this because they have restricted access to domain-specific databases.
2. Fragility of Existing Systems (GeneGPT): While GeneGPT is effective, it relies on a single-agent architecture with rigid dependencies on specific API formats. This makes the system fragile when interfacing with evolving tools.
3. Context and Focus Issues: GeneGPT relies on extensive context windows, which can lead to "attention dilution" where the model loses focus on the original query.