Project Overview
This project investigates and benchmarks pathfinding algorithms on the PrimeKG (Precision Medicine Knowledge Graph) to evaluate their effectiveness in discovering mechanistically plausible drug repurposing candidates.
We aim to answer a critical question: Can standard graph algorithms find biologically meaningful pathways, or do they fall into "shortcut traps" that produce statistically short but mechanistically meaningless paths?
Motivation
The Hub Trap
High-degree nodes like TP53 or UBC act as "shortcuts" that algorithms prefer, but these paths lack biological specificity and explanatory power.
The Drug Trap
Naive algorithms often route through other drugs as intermediates, producing paths like "Drug A → Drug B → Disease" that are pharmacologically meaningless.
The Disease Trap
Paths may jump through unrelated diseases that share genetic risk factors, creating spurious associations without mechanistic basis.
Example: Tirofiban → Acute Coronary Syndrome
Below is a mechanistically grounded drug–disease pathway showing how Tirofiban treats Acute Coronary Syndrome by inhibiting platelet aggregation through the platelet integrin receptor αIIbβ3.
Mechanistic Explanation
- Tirofiban is a small-molecule antagonist of the platelet glycoprotein IIb/IIIa receptor.
- It binds the platelet integrin complex αIIbβ3 (ITGA2B/ITGB3), preventing fibrinogen binding.
- This blocks platelet aggregation, the final common step in thrombus formation.
- Reduced aggregation decreases coronary thrombus formation after plaque rupture.
- Preventing coronary thrombosis reduces the risk and severity of Acute Coronary Syndrome.
Methods: Algorithm Families
All methods share biological transition constraints that ban direct drug → disease shortcuts.
| Algorithm | Type | Key Idea |
|---|---|---|
|
Dijkstra
|
Classical | Unweighted shortest-path baseline |
|
Hub-Penalized
|
Heuristic | Penalize high-degree hub nodes |
|
PageRank-Inverse
|
Centrality | Penalize high-PageRank nodes |
|
Meta-Path BFS
|
Constraint-Based | Only valid biological relation sequences |
|
Semantic Bridging
|
LLM | Edge weight = 1 − β · cosine similarity |
|
Bidirectional ★
|
Search | Forward + backward Dijkstra |
|
K-Shortest + Bio
|
Search | k = 4 candidate paths, biological re-ranking |
|
Bidirectional + Relation Wt
|
Hybrid | Enrichment-based relation weighting |