PrimeKG – Results

Benchmark Results

We evaluated eight algorithms on 150 curated drug–disease pathways mapped from DrugMechDB into PrimeKG. The main result is that edge weighting alone produces very small differences, while changing the search strategy leads to the first clear improvement in mechanism recovery.

Headline Findings

Phase 1 Convergence

The five edge-weighting methods cluster tightly together. The overall F1 spread is only 0.023, which shows that graph topology dominates edge reweighting.

Best Performer

Bidirectional search achieved the best overall performance with F1 = 0.5709 and Edit Distance = 0.4291, outperforming all Phase 1 baselines.

Main Interpretation

The problem is not just which edge weights we use. The deeper issue is that shortest-path methods systematically collapse long mechanisms into short shortcut paths.

Algorithm Performance Across 150 Pathways

Phase 1 compares edge-weighting strategies. Phase 2 introduces search-strategy changes. Bidirectional search is the first method that improves both node-overlap accuracy and path-order similarity.

F1 Score ↑

Phase 1

Semantic Bridging

0.5569

PageRank-Inverse

0.5472

Dijkstra

0.5448

Hub-Penalized

0.5403

Meta-Path BFS

0.5338

Phase 2

Bidirectional ★

0.5709

K-Shortest + Bio

0.5553

Bidir + Relation Wt

0.5491

Edit Distance ↓

Phase 1

Semantic Bridging

0.5374

PageRank-Inverse

0.5437

Hub-Penalized

0.5513

Meta-Path BFS

0.5612

Dijkstra

0.6146

Phase 2

Bidirectional ★

0.4291

K-Shortest + Bio

0.4447

Bidir + Relation Wt

0.4509

Phase 1 convergence: F1 spread = 0.023. Edge weighting alone does not substantially change performance. The graph structure itself dominates.

Case Studies

These examples show two common outcomes. Some algorithms collapse long mechanisms into short shortcuts. Others recover the correct intermediate proteins when shortcut hubs are discouraged.

Case Study

Regorafenib → GIST: 5-Node Kinase Signaling Mechanism

Ground Truth 5 nodes

DRGREG

Regorafenib

PRTPDGFRA

PDGF Receptor α

BIOPROLIF

Cell Proliferation

PRTKIT

KIT Receptor

DISGIST

Gastroint. Stromal Tumor

Semantic Bridging 1 / 3 intermediates

DRGREG

Regorafenib

PRTPDGFRA

PDGF Receptor α

DISGIST

Gastroint. Stromal Tumor

▸ Upstream target only. Finds PDGFRA via name similarity, but skips cell proliferation and KIT.

PageRank-Inverse 1 / 3 intermediates

DRGREG

Regorafenib

PRTKIT

KIT Receptor

DISGIST

Gastroint. Stromal Tumor

▸ Downstream hub only. Finds KIT, but misses the upstream signaling chain and biological process.

Case Study

Pegvisomant → Acromegaly: 4-Node Growth Hormone Mechanism

Ground Truth 4 nodes

DRGPEG

Pegvisomant

PRTGHR

Growth Hormone Receptor

PRTGH1

Somatotropin

DISACR

Acromegaly

Dijkstra — Shortest Path 0 / 2 intermediates

DRGPEG

Pegvisomant

DISACR

Acromegaly

✗ Direct indication edge. Skips GHR and GH1 entirely, so no mechanism is recovered.

Hub-Penalized 2 / 2 intermediates

DRGPEG

Pegvisomant

PRTGHR

Growth Hormone Receptor

PRTGH1

Somatotropin

DISACR

Acromegaly

✓ Full mechanism recovered. Hub penalization recovers both correct protein intermediates.

Failure Analysis

Failure patterns are not random. The strongest recurring issue is routing through highly connected hub nodes, which lowers biological specificity.

Hub Routing Hurts F1

For bidirectional search, pathways that route through high-degree hub intermediates have a lower average F1 than pathways that avoid them. This supports the idea that shortcut nodes reduce mechanistic fidelity.

Shortest Paths Are Too Short

Ground-truth mechanisms average 5.8 nodes, while most predicted paths are much shorter. This means many algorithms terminate early before traversing the actual biological cascade.

Node Selection Is the Bottleneck

Forcing longer paths does not automatically solve the problem. Once the first hop is wrong, downstream predictions tend to remain mechanistically incorrect.

Key Takeaways

Edge weighting alone has limited impact on path recovery quality.
Bidirectional search is the first algorithmic change that produces a clear improvement.
Hub shortcuts and path-length collapse are the dominant failure modes.
Future methods should focus on node selection, relation grammar, and deeper mechanistic search.