PrimeKG Pathfinding Algorithm Benchmark Laboratory

Benchmarking Graph Algorithms for Drug Repurposing Discovery

Project Overview

This project investigates and benchmarks pathfinding algorithms on the PrimeKG (Precision Medicine Knowledge Graph) to evaluate their effectiveness in discovering mechanistically plausible drug repurposing candidates.

We aim to answer a critical question: Can standard graph algorithms find biologically meaningful pathways, or do they fall into "shortcut traps" that produce statistically short but mechanistically meaningless paths?

Motivation

The Hub Trap

High-degree nodes like TP53 or UBC act as "shortcuts" that algorithms prefer, but these paths lack biological specificity and explanatory power.

The Drug Trap

Naive algorithms often route through other drugs as intermediates, producing paths like "Drug A → Drug B → Disease" that are pharmacologically meaningless.

The Disease Trap

Paths may jump through unrelated diseases that share genetic risk factors, creating spurious associations without mechanistic basis.

Example: Tirofiban → Acute Coronary Syndrome

Below is a mechanistically grounded drug–disease pathway showing how Tirofiban treats Acute Coronary Syndrome by inhibiting platelet aggregation through the platelet integrin receptor αIIbβ3.

💊
Tirofiban
DB00775 | Drug
inhibits
🧬
Integrin αIIbβ3
ITGA2B / ITGB3 | Platelet receptor
mediates
🔄
Platelet aggregation
GO:0070527 | Biological process
promotes
🩸
Coronary thrombosis
Pathophysiology
causes
🏥
Acute Coronary Syndrome
D054058 | Disease

Mechanistic Explanation

  1. Tirofiban is a small-molecule antagonist of the platelet glycoprotein IIb/IIIa receptor.
  2. It binds the platelet integrin complex αIIbβ3 (ITGA2B/ITGB3), preventing fibrinogen binding.
  3. This blocks platelet aggregation, the final common step in thrombus formation.
  4. Reduced aggregation decreases coronary thrombus formation after plaque rupture.
  5. Preventing coronary thrombosis reduces the risk and severity of Acute Coronary Syndrome.

Methods: Algorithm Families

All methods share biological transition constraints that ban direct drug → disease shortcuts.

Algorithm Type Key Idea
Dijkstra
Classical Unweighted shortest-path baseline
Hub-Penalized
Heuristic Penalize high-degree hub nodes
PageRank-Inverse
Centrality Penalize high-PageRank nodes
Meta-Path BFS
Constraint-Based Only valid biological relation sequences
Semantic Bridging
LLM Edge weight = 1 − β · cosine similarity
Bidirectional
Search Forward + backward Dijkstra
K-Shortest + Bio
Search k = 4 candidate paths, biological re-ranking
Bidirectional + Relation Wt
Hybrid Enrichment-based relation weighting