JS Wei (Jack) Sun

OncoAgent posts 100% on a retrieval proxy, skips the NCCN check

Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.

← Back to the issue

Sources

“OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support” huggingface.co

References

ScienceDaily on JAMA Oncology study (Brigham & Women’s) sciencedaily.com

About one-third of [ChatGPT’s] cancer treatment recommendations included at least one that did not align with NCCN guidelines… 12.5 percent of responses were ‘hallucinated,’ or were not part of any recommended treatment.

PMC review of LLMs in oncology decision-making pmc.ncbi.nlm.nih.gov

81.8% of incorrect answers from LLMs in medical oncology were rated as having a medium to high likelihood of causing moderate to severe harm if acted upon by a clinician or patient.

r/LocalLLaMA MI300X benchmark thread reddit.com

Real-world LLM inference on MI300X often lands at 37–66% of H100 throughput despite the 192GB VRAM advantage, due to ROCm kernel maturity and silent 4-bit decode NaN bugs requiring pre-release dependency builds.

arXiv survey of CRAG for clinical RAG arxiv.org

The decompose-then-recompose ‘knowledge strip’ filter can decontextualize medical facts, omitting contraindications or patient-specific nuance; web-search fallback when local retrieval is graded ‘Incorrect’ introduces an unverified hallucination vector into clinical workflows.

Stanford Digital Economy Lab — ‘Centaur Evaluations’ digitaleconomy.stanford.edu

Models evaluated in isolation systematically over-promise: better evaluation, including how humans and AI perform tasks jointly, is the prerequisite for trustworthy clinical AI — not higher headline accuracy.

Kaggle BigQuery AI Hackathon writeup (OncoMix-AI) kaggle.com

17% of oncology-specific LLM studies use non-standardized or customized performance measures… LLMs are prone to providing ‘device-like’ unauthorized advice even when prompted to follow FDA CDS constraints.

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News · Essays

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare