Benchmark Results
We evaluated eight algorithms on 150 curated drug–disease pathways mapped from DrugMechDB into PrimeKG. The main result is that edge weighting alone produces very small differences, while changing the search strategy leads to the first clear improvement in mechanism recovery.
Headline Findings
Phase 1 Convergence
The five edge-weighting methods cluster tightly together. The overall F1 spread is only 0.023, which shows that graph topology dominates edge reweighting.
Best Performer
Bidirectional search achieved the best overall performance with F1 = 0.5709 and Edit Distance = 0.4291, outperforming all Phase 1 baselines.
Main Interpretation
The problem is not just which edge weights we use. The deeper issue is that shortest-path methods systematically collapse long mechanisms into short shortcut paths.
Algorithm Performance Across 150 Pathways
Phase 1 compares edge-weighting strategies. Phase 2 introduces search-strategy changes. Bidirectional search is the first method that improves both node-overlap accuracy and path-order similarity.
F1 Score ↑
Edit Distance ↓
Phase 1 convergence: F1 spread = 0.023. Edge weighting alone does not substantially change performance. The graph structure itself dominates.
Case Studies
These examples show two common outcomes. Some algorithms collapse long mechanisms into short shortcuts. Others recover the correct intermediate proteins when shortcut hubs are discouraged.
Regorafenib → GIST: 5-Node Kinase Signaling Mechanism
Pegvisomant → Acromegaly: 4-Node Growth Hormone Mechanism
Failure Analysis
Failure patterns are not random. The strongest recurring issue is routing through highly connected hub nodes, which lowers biological specificity.
Hub Routing Hurts F1
For bidirectional search, pathways that route through high-degree hub intermediates have a lower average F1 than pathways that avoid them. This supports the idea that shortcut nodes reduce mechanistic fidelity.
Shortest Paths Are Too Short
Ground-truth mechanisms average 5.8 nodes, while most predicted paths are much shorter. This means many algorithms terminate early before traversing the actual biological cascade.
Node Selection Is the Bottleneck
Forcing longer paths does not automatically solve the problem. Once the first hop is wrong, downstream predictions tend to remain mechanistically incorrect.
Key Takeaways
- Edge weighting alone has limited impact on path recovery quality.
- Bidirectional search is the first algorithmic change that produces a clear improvement.
- Hub shortcuts and path-length collapse are the dominant failure modes.
- Future methods should focus on node selection, relation grammar, and deeper mechanistic search.