OncoAgent posts 100% on a retrieval proxy, skips the NCCN check
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
“OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support” huggingface.co
References
ScienceDaily on JAMA Oncology study (Brigham & Women’s) sciencedaily.com
About one-third of [ChatGPT’s] cancer treatment recommendations included at least one that did not align with NCCN guidelines… 12.5 percent of responses were ‘hallucinated,’ or were not part of any recommended treatment.
PMC review of LLMs in oncology decision-making pmc.ncbi.nlm.nih.gov
81.8% of incorrect answers from LLMs in medical oncology were rated as having a medium to high likelihood of causing moderate to severe harm if acted upon by a clinician or patient.
r/LocalLLaMA MI300X benchmark thread reddit.com
Real-world LLM inference on MI300X often lands at 37–66% of H100 throughput despite the 192GB VRAM advantage, due to ROCm kernel maturity and silent 4-bit decode NaN bugs requiring pre-release dependency builds.
arXiv survey of CRAG for clinical RAG arxiv.org
The decompose-then-recompose ‘knowledge strip’ filter can decontextualize medical facts, omitting contraindications or patient-specific nuance; web-search fallback when local retrieval is graded ‘Incorrect’ introduces an unverified hallucination vector into clinical workflows.
Stanford Digital Economy Lab — ‘Centaur Evaluations’ digitaleconomy.stanford.edu
Models evaluated in isolation systematically over-promise: better evaluation, including how humans and AI perform tasks jointly, is the prerequisite for trustworthy clinical AI — not higher headline accuracy.
Kaggle BigQuery AI Hackathon writeup (OncoMix-AI) kaggle.com
17% of oncology-specific LLM studies use non-standardized or customized performance measures… LLMs are prone to providing ‘device-like’ unauthorized advice even when prompted to follow FDA CDS constraints.