Sources

“OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support” huggingface.co

References

ScienceDaily on JAMA Oncology study (Brigham & Women’s) sciencedaily.com

About one-third of [ChatGPT’s] cancer treatment recommendations included at least one that did not align with NCCN guidelines… 12.5 percent of responses were ‘hallucinated,’ or were not part of any recommended treatment.

PMC review of LLMs in oncology decision-making pmc.ncbi.nlm.nih.gov

81.8% of incorrect answers from LLMs in medical oncology were rated as having a medium to high likelihood of causing moderate to severe harm if acted upon by a clinician or patient.

r/LocalLLaMA MI300X benchmark thread reddit.com

Real-world LLM inference on MI300X often lands at 37–66% of H100 throughput despite the 192GB VRAM advantage, due to ROCm kernel maturity and silent 4-bit decode NaN bugs requiring pre-release dependency builds.

arXiv survey of CRAG for clinical RAG arxiv.org

The decompose-then-recompose ‘knowledge strip’ filter can decontextualize medical facts, omitting contraindications or patient-specific nuance; web-search fallback when local retrieval is graded ‘Incorrect’ introduces an unverified hallucination vector into clinical workflows.

Stanford Digital Economy Lab — ‘Centaur Evaluations’ digitaleconomy.stanford.edu

Models evaluated in isolation systematically over-promise: better evaluation, including how humans and AI perform tasks jointly, is the prerequisite for trustworthy clinical AI — not higher headline accuracy.

Kaggle BigQuery AI Hackathon writeup (OncoMix-AI) kaggle.com

17% of oncology-specific LLM studies use non-standardized or customized performance measures… LLMs are prone to providing ‘device-like’ unauthorized advice even when prompted to follow FDA CDS constraints.

Sources

References

Jack Sun, writing.