Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. In ACM Transactions on Computing for Healthcare (HEALTH) (eds Lee, I. & Stankovic, J. A.) 3, 1−23 (Association for Computing Machinery, 2022).

Nori, H. et al. Sequential diagnosis with language models. Preprint at https://arxiv.org/abs/2506.22405 (2025).

OpenAI. Introducing GPT-5. https://openai.com/index/introducing-gpt-5/ (2025).

Saab, K. et al. Capabilities of Gemini models in medicine. Preprint at https://arxiv.org/abs/2404.18416 (2024).

Tu, T. et al. Towards conversational diagnostic AI. Preprint at https://arxiv.org/abs/2401.05654 (2024).

Wang, S. et al. LINS: a general medical Q&A framework for enhancing the quality and credibility of LLM-generated responses. Nat. Commun. 16, 9076 (2025).

Article
CAS
PubMed
PubMed Central

Google Scholar

Arora, R. K. et al. HealthBench: evaluating large language models towards improved human health. Preprint at https://arxiv.org/abs/2505.08775 (2025).

Handler, R., Sharma, S. & Hernandez-Boussard, T. The fragile intelligence of GPT-5 in medicine. Nat. Med. 31, 3968–3970 (2025).

Article
CAS
PubMed

Google Scholar

Farquhar, S., Kossen, J., Kuhn, L. & Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 630, 625–630 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Jin, Q. et al. Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine. NPJ Digit. Med. 7, 190 (2024).

Article
PubMed
PubMed Central

Google Scholar

Pfau, J., Merrill, W. & Bowman, S. R. Let’s think dot by dot: hidden computation in transformer language models. In First Conference on Language Modeling (COLM) https://openreview.net/forum?id=NikbrdtYvG (2024).

Geirhos, R. et al. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2, 665–673 (2020).

Article

Google Scholar

Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).

Article
CAS
PubMed

Google Scholar

Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. Preprint at https://arxiv.org/abs/1412.6572 (2015).

Szegedy, C. et al. Intriguing properties of neural networks. Preprint at https://arxiv.org/abs/1312.6199 (2013).

The New England Journal of Medicine: Image Challenge. https://www.nejm.org/image-challenge (2026).

JAMA Network Clinical Challenge. https://jamanetwork.com/collections/44038/clinical-challenge (2026).

Comanici, G. et al. Gemini 2.5: pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. Preprint at https://arxiv.org/abs/2507.06261 (2025).

Anthropic. Claude 3.5 Sonnet. https://www.anthropic.com/news/claude-3-5-sonnet (2024).

OpenAI. GPT-4o system card. https://openai.com/index/gpt-4o-system-card/ (2024).

OpenAI. OpenAI o3 and o4-mini system card. https://openai.com/index/o3-o4-mini-system-card/ (2025).

Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In NIPSʼ22: Proceedings of the 36th International Conference on Neural Information Processing Systems 24824−24837 (eds Koyejo, S. et al.) (Curran Associates, 2022).

Lau, J. J., Gayen, S., Ben Abacha, A. & Demner-Fushman, D. A dataset of clinically generated visual questions and answers about radiology images. Sci. Data 5, 180251 (2018).

Article
PubMed
PubMed Central

Google Scholar

Hu, Y. et al. OmniMedVQA: a new large-scale comprehensive evaluation benchmark for medical LVLM. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR52733.2024.02093 (IEEE, 2024).

Johnson, A. E. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019).

Article
PubMed
PubMed Central

Google Scholar

He, X., Zhang, Y., Mou, L., Xing, E. & Xie, P. PathVQA: 30000+ questions for medical visual question answering. Preprint at https://arxiv.org/abs/2003.10286 (2020).

Liu, B. et al. SLAKE: a semantically-labeled knowledge-enhanced dataset for medical visual question answering. Preprint at https://arxiv.org/abs/2102.09542 (2021).

Zhang, X. et al. PMC-VQA: visual instruction tuning for medical visual question answering. Preprint at https://arxiv.org/abs/2305.10415 (2023).

Yue, X. et al. MMMU: a massive multidiscipline multimodal understanding and reasoning benchmark for expert AGI. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/CVPR52733.2024.00913 (IEEE, 2024).

Fleiss, J. L. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 378–382 (1971).

Article

Google Scholar

Wu, Z. et al. DeepSeek-VL2: mixture-of-experts vision-language models for advanced multimodal understanding. Preprint at https://arxiv.org/abs/2412.10302 (2024).

Bai, S. et al. Qwen3-VL technical report. Preprint at https://arxiv.org/abs/2511.21631 (2025).

Li, C. et al. LLaVA-Med: training a large language-and-vision assistant for biomedicine in one day. In NIPS ʼ23: Proceedings of the 37th International Conference on Neural Information Processing Systems (eds Oh, A. et al.) 28541−28564 (Curran Associates, 2023).

Sellergren, A. et al. MedGemma technical report. Preprint at https://arxiv.org/abs/2507.05201 (2025).

Name	Price	7d %	Volume
Bitcoin btc	$59,669	-5.23%	$40,098,889,029
Ethereum eth	$1,569	-7.66%	$14,993,609,850
XRP xrp	$1.05	-7.56%	$2,491,944,272
Solana sol	$72	4.62%	$4,565,911,520
Dogecoin doge	$0.075	-9.09%	$748,139,199

Peace Education in Action

Latest Articles

Editors' Picks

Evaluating the robustness and readiness of large frontier models in health AI applications

The War Has Shifted Iran’s Relationship With Hezbollah and Other Proxies

SPLC reacts to DeSantis veto of measure allowing student volunteers at poll places • Florida Phoenix

You may also like

Leave a Comment Cancel Reply

Latest Articles

Login/Register

Editors' Picks