PrimeKG Pathfinding Benchmark

Project Overview

This project investigates and benchmarks pathfinding algorithms on the PrimeKG (Precision Medicine Knowledge Graph) to evaluate their effectiveness in discovering mechanistically plausible drug repurposing candidates.

We aim to answer a critical question: Can standard graph algorithms find biologically meaningful pathways, or do they fall into "shortcut traps" that produce statistically short but mechanistically meaningless paths?

Motivation

The Hub Trap

High-degree nodes like TP53 or UBC act as "shortcuts" that algorithms prefer, but these paths lack biological specificity and explanatory power.

The Drug Trap

Naive algorithms often route through other drugs as intermediates, producing paths like "Drug A → Drug B → Disease" that are pharmacologically meaningless.

The Disease Trap

Paths may jump through unrelated diseases that share genetic risk factors, creating spurious associations without mechanistic basis.

Example: Tirofiban → Acute Coronary Syndrome

Below is a mechanistically grounded drug–disease pathway showing how Tirofiban treats Acute Coronary Syndrome by inhibiting platelet aggregation through the platelet integrin receptor αIIbβ3.

💊

Tirofiban

DB00775 | Drug

inhibits

🧬

Integrin αIIbβ3

ITGA2B / ITGB3 | Platelet receptor

mediates

🔄

Platelet aggregation

GO:0070527 | Biological process

promotes

🩸

Coronary thrombosis

Pathophysiology

causes

🏥

Acute Coronary Syndrome

D054058 | Disease

Mechanistic Explanation

Tirofiban is a small-molecule antagonist of the platelet glycoprotein IIb/IIIa receptor.
It binds the platelet integrin complex αIIbβ3 (ITGA2B/ITGB3), preventing fibrinogen binding.
This blocks platelet aggregation, the final common step in thrombus formation.
Reduced aggregation decreases coronary thrombus formation after plaque rupture.
Preventing coronary thrombosis reduces the risk and severity of Acute Coronary Syndrome.

Methods: Algorithm Families

All methods share biological transition constraints that ban direct drug → disease shortcuts.

Algorithm	Type	Key Idea
Dijkstra	Classical	Unweighted shortest-path baseline
Hub-Penalized	Heuristic	Penalize high-degree hub nodes
PageRank-Inverse	Centrality	Penalize high-PageRank nodes
Meta-Path BFS	Constraint-Based	Only valid biological relation sequences
Semantic Bridging	LLM	Edge weight = 1 − β · cosine similarity
Bidirectional ★	Search	Forward + backward Dijkstra
K-Shortest + Bio	Search	k = 4 candidate paths, biological re-ranking
Bidirectional + Relation Wt	Hybrid	Enrichment-based relation weighting