TEAM RABBIT

2025 Genomic Diagnostics Honorable Mention

Team Rabbit’s quantitative framework accelerates the diagnosis and research of gene mutations by leveraging an ensemble of models that includes the use of a Large Language Model (LLM) on medical literature in combination with standard medical database access to automate the identification of disease mechanisms and allowing for efficient drug repurposing. Their system extracts data from credible literature, analyzes biological effects, and streamlines rare disease research, improving diagnostic efficiency, treatment timelines, and drug development for rare diseases. 

PROJECT SUMMARY

Purpose and Objectives 

  • Background: The complexity and variability in genetic research make it challenging to identify the mechanisms by which mutations cause rare diseases. Specifically, clinicians searching for relevant gene-disease mechanisms have to endure volumes of inefficient data extraction. This, in turn, delays diagnosis and treatment of rare diseases. 

  • Inspiration: Inspired by the stories and efforts of patients, families, and healthcare professionals navigating these challenges, we wanted to create a solution that accelerates the process of understanding disease mechanisms and improves patient outcomes. 

  • Our Solution: We are developing a scoring framework using both a LLM and genomic databases that automates the identification of genetic mutation mechanisms (loss of function, gain of function, and dominant negative). This framework is composed of… 

    • API retrieval of gene-disease abstracts through credible databases 

    • the feeding of that text data to a locally ran LLM. 

    • an ensemble comparison between outputs for data validation 

    • classifications of gene variants (LoF, GoF, DN) with a confidence score to guide research on foundational disease progression 

      • downstream, we could repurpose this model to match each mechanism type with a potential therapeutic strategy (agonists/enhancers for LoF, inhibitors for GoF, disruptors for DN) to streamline pre-IND decision-making 

By streamlining research, our tool helps clinicians diagnose rare diseases more efficiently, ultimately expediting treatment and improving patient care. 

Target Audience 

After engaging with families affected by rare diseases and medical professionals in the field, we recognized the need to streamline diagnosis and treatment. This project is designed to support healthcare providers by enabling earlier intervention and improving patient outcomes and overall quality of life. 

Tools 

  • Utilizes Python to handle the majority of the data extraction, scraping, and storage.

  • Integrates various APIs to input data from literature & relevant databases.

  • Integrates a Large Language Model to parse through and analyze data. 

Key Features 

  1. Extraction of data from scholarly research and publications. 

    API for Semantic Scholar using ClinGen data. 

  2. Analysis and processing of public datasets & literature using a LLM. 

  3. Interpretation of databases to classify disease mechanisms: Gene2Phenotype, GoFCards, Yeast2Human DNA, LoGoFunc.

  4. Scoring based off of confidence values and agreement between models in ensemble.

  5. Clear and intuitive presentation of results for users. 

  6. Architecture built for expansion with more gene-disease pairs and models. 

Expected Outcomes 

Using results and analysis from selected literature, healthcare providers can more efficiently determine if a rare disease causes a loss or gain of biological function, or if a mutated gene inhibits a healthy protein. Our framework streamlines this classification process, enabling faster and more accurate treatment with 

personalized medicine after diagnosis, reducing the time patients and families spend searching for answers. Early identification of the disease mechanism allows for improving patient outcomes and quality of life. 

Next Steps 

  1. Scalability: We aim to expand our framework to handle larger datasets across multiple platforms, enhancing accuracy and precision in identifying rare disease mechanisms. This will allow for broader integration with research databases and clinical tools, ensuring more comprehensive analyses. 

  2. Optimization of Algorithms and Performance: To reduce processing time for analyzing input literature, we plan to implement advanced techniques such as multithreading and optimized data structures. These improvements will enhance efficiency, making the framework more responsive and scalable for real-world applications. 

  3. User-Friendly Interface: We intend to develop an intuitive, accessible interface that enables users to efficiently search for genes and their associated diseases. By improving navigation and visualization, we aim to make our platform a valuable tool for healthcare providers, researchers, and families seeking crucial genetic insights.

MEET THE TEAM

Alisa Zhang
University of California, San Diego
Undergraduate (2028)
Bioinformatics & Cognitive Science Machine Learning

Yuxin Zeng
Tufts University
Undergraduate (2028)
Computer Science &
Cognitive & Brain Sciences

Edward Sun
University of Pennsylvania
Undergraduate (2027)
Biology & Finance

Mark Endicott
Michigan State University
Undergraduate (2026)
Data Science

Yufei Chen
Massachusetts Institute of Technology
Undergraduate (2028)
Computational Biology