RARE REGISTREE
2025 Therapeutic Targets Winner
Rare Registree is a centralized genetic registry designed to overcome the data fragmentation and search limitations currently hindering rare disease diagnosis and research. By integrating patient demographics, genetic markers, and clinical histories with AI-powered semantic search, the platform bridges the gap between lay and clinical terminology. Its coding-free interface and real-time data interoperability aim to empower both clinicians and patients while accelerating personalized treatments and research collaboration.
PROJECT SUMMARY
Introduction
Advances in genomic sequencing have paved the way for precision medicine—not only for common disorders but also for rare diseases. However, the current landscape is marred by fragmented, inaccessible data and interfaces that require extensive technical know-how. This memo outlines our proposal for a centralized genetic registry that directly addresses these issues by integrating detailed genetic markers (including SNPs), patient demographics, clinical interventions, and cutting-edge AI-powered search capabilities.
Problem Statement
Current rare disease data systems suffer from several limitations:
Fragmented and Inaccessible Data: Existing databases are scattered and often stored in formats that require significant coding expertise to extract useful information.
Semantic Discrepancies: Patients and physicians frequently describe symptoms using everyday language (e.g., “knees aching” versus “joint pain”), which does not always match the technical vocabulary in databases.
Limited Search Functionality: Traditional registries focus primarily on gene or chromosome identification and lack robust Boolean search tools. They do not offer the granularity needed for pinpointing specific SNPs or genetic markers, nor do they integrate historical treatment data or patient demographics necessary for accurate cohort analyses.
Proposed Solution
We propose a centralized, relational registry with an intuitive, coding-free user interface designed for both clinicians and patients. Key components include:
Comprehensive Data Aggregation:
Central Repository: Collect and standardize a wide range of data—from rare genetic variants to common markers, clinical demographics, and past treatment outcomes.
Interoperability: Facilitate data sharing among researchers, physicians, and biotechnologists to support real-time updates and continuous learning.
Advanced Search Capabilities:
Robust Boolean Search: Allow users to query by chromosome, locus, symptom, ORPHAdata ID, and more, ensuring precise identification of critical genetic markers.
Fuzzy Search with AI: Incorporate a custom large language model (LLM) that maps patient-entered symptoms to standardized Human Phenotype Ontology (HPO) terms. This approach uses noise injection and multiple iterations to capture semantic similarities (e.g., “knees aching” versus “joint pain”), bridging the gap between patient language and stored data.
User-Centric Interface Design:
Accessibility: Deliver the registry as a lightweight, applet-based platform that is accessible without any coding knowledge.
Integrated Visualization: Embed tools like Biodalliance to enable fast, interactive genome visualization—even when exact SNP details are not immediately available.
Dynamic UI Elements: Provide intuitive features such as patient characteristic displays, ranked disease prediction based on input text, detailed locus landing pages, and downloadable datasets for further analysis.
Implementation Strategy
Centralized, Relational Database: Develop a structured repository that not only collects genetic markers but also includes demographic details and historical treatment data, enhancing the context for clinical decision-making.
AI-Driven Semantic Mapping: Utilize our custom LLM to process patient input. By injecting noise into the text and repeatedly mapping it to HPO terms, the system can accurately predict and rank potential diseases based on symptom similarity.
Enhanced Data Connectivity: Build robust connections between symptom descriptions, genetic data, and research studies (including PubMed-linked treatment pathways), enabling seamless cohort analysis and translational research.
Future-Proof Design: As new genetic findings and treatment strategies emerge, the registry will continuously update, ensuring that clinicians and researchers have access to the most current and comprehensive data.
Impact and Future Directions
By implementing this integrated genetic registry, we aim to:
Improve Diagnostic Accuracy: Enable earlier and more precise identification of genetic predispositions.
Optimize Personalized Treatments: Provide clinicians with detailed insights that allow for the tailoring of therapeutic interventions.
Accelerate Research and Collaboration: Foster a data ecosystem that promotes seamless sharing between disciplines, thereby expediting the translation of genomic research into clinical practice.
Empower Patients: Offer a platform that not only informs but also actively engages patients in their care by providing accessible, individualized genetic insights.
Conclusion
This initiative represents a strategic leap forward in the management of rare diseases. By combining a centralized, user-friendly registry with advanced Boolean and fuzzy search technologies, we can bridge the gap between patient-reported symptoms and clinical data.
MEET THE TEAM
Rishabh Ghosh
Harvard College
Undergraduate (2024)
Mathematics
TingTing Yan
Harvard College
Undergraduate (2026)
Human Developmental & Regenerative Biology & Computer Science
Jan Tobias Boehnke
Harvard Medical School
Graduate Student (2025)
Microbiology & Physics