Federated benchmarking of medical artificial intelligence with MedPerf

Karargyris, Alexandros; Umeton, Renato; Sheller, Micah J.; Aristizabal, Alejandro; George, Johnu; Wuest, Anna; Pati, Sarthak; Kassem, Hasan; Zenk, Maximilian; Baid, Ujjwal; Narayana Moorthy, Prakash; Chowdhury, Alexander; Guo, Junyi; Nalawade, Sahil; Rosenthal, Jacob; Kanter, David; Xenochristou, Maria; Beutel, Daniel J.; Chung, Verena; Bergquist, Timothy; Eddy, James; Abid, Abubakar; Tunstall, Lewis; Sanseviero, Omar; Dimitriadis, Dimitrios; Qian, Yiming; Xu, Xinxing; Liu, Yong; Goh, Rick Mong; Bala, Srini; Bittorf, Victor; Puchala, Sreekar Reddy; Ricciuti, Biagio; Samineni, Soujanya; Sengupta, Eshna; Chaudhari, Akshay; Coleman, Cody; Desinghu, Bala; Diamos, Gregory; Dutta, Debo; Feddema, Diane; Fursin, Grigori; Huang, Xinyuan; Kashyap, Satyananda; Lane, Nicholas; Mallick, Indranil; Mascagni, Pietro; Mehta, Virendra; Moraes, Cassiano Ferro; Natarajan, Vivek; Nikolov, Nikola; Padoy, Nicolas; Pekhimenko, Gennady; Reddi, Vijay Janapa; Reina, G. Anthony; Ribalta, Pablo; Singh, Abhishek; Thiagarajan, Jayaraman J.; Albrecht, Jacob; Wolf, Thomas; Miller, Geralyn; Fu, Huazhu; Shah, Prashant; Xu, Daguang; Yadav, Poonam; Talby, David; Awad, Mark M.; Howard, Jeremy P.; Rosenthal, Michael; Marchionni, Luigi; Loda, Massimo; Johnson, Jason M.; Bakas, Spyridon; Mattson, Peter

doi:10.1038/s42256-023-00652-2

Federated benchmarking of medical artificial intelligence with MedPerf

Journal Article · Mon Jul 17 00:00:00 EDT 2023 · Nature Machine Intelligence

DOI:https://doi.org/10.1038/s42256-023-00652-2· OSTI ID:2203350

; ; ; Aristizabal, Alejandro; George, Johnu; Wuest, Anna; ; ; ; Baid, Ujjwal; Narayana Moorthy, Prakash; Chowdhury, Alexander; Guo, Junyi; ; ; Kanter, David; Xenochristou, Maria; Beutel, Daniel J.; Chung, Verena; more »

Medical artificial intelligence (AI) has tremendous potential to advance healthcare by supporting and contributing to the evidence-based practice of medicine, personalizing patient treatment, reducing costs, and improving both healthcare provider and patient experience. Unlocking this potential requires systematic, quantitative evaluation of the performance of medical AI models on large-scale, heterogeneous data capturing diverse patient populations. Here, to meet this need, we introduce MedPerf, an open platform for benchmarking AI models in the medical domain. MedPerf focuses on enabling federated evaluation of AI models, by securely distributing them to different facilities, such as healthcare organizations. This process of bringing the model to the data empowers each facility to assess and verify the performance of AI models in an efficient and human-supervised process, while prioritizing privacy. We describe the current challenges healthcare and AI communities face, the need for an open platform, the design philosophy of MedPerf, its current implementation status and real-world deployment, our roadmap and, importantly, the use of MedPerf with multiple international institutions within cloud-based technology and on-premises scenarios. Finally, we welcome new contributions by researchers and organizations to further strengthen MedPerf as an open benchmarking platform.

View Accepted Manuscript (DOE)

Research Organization:: Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)

Sponsoring Organization:: AI Singapore Programme; Career Development Fund; Helmholtz Association; National Institutes of Health (NIH); USDOE National Nuclear Security Administration (NNSA)

Contributing Organization:: AI4SafeChole Consortium; BraTS-2020 Consortium; FeTS Consortium

Grant/Contract Number:: AC52-07NA27344

OSTI ID:: 2203350

Report Number(s):: LLNL--JRNL-834413; 1052386

Journal Information:: Nature Machine Intelligence, Journal Name: Nature Machine Intelligence Journal Issue: 7 Vol. 5; ISSN 2522-5839

Publisher:: Springer NatureCopyright Statement

Country of Publication:: United States

Language:: English

References (42)

Geographic Distribution of US Cohorts Used to Train Deep Learning Algorithms Kaushal, Amit; Altman, Russ; Langlotz, Curt JAMA, Vol. 324, Issue 12 https://doi.org/10.1001/jama.2020.12067	journal	September 2020
Association Between Surgical Skin Markings in Dermoscopic Images and Diagnostic Performance of a Deep Learning Convolutional Neural Network for Melanoma Recognition Winkler, Julia K.; Fink, Christine; Toberer, Ferdinand JAMA Dermatology, Vol. 155, Issue 10 https://doi.org/10.1001/jamadermatol.2019.1735	journal	October 2019
Randomized Clinical Trials of Machine Learning Interventions in Health Care Plana, Deborah; Shung, Dennis L.; Grimshaw, Alyssa A. JAMA Network Open, Vol. 5, Issue 9 https://doi.org/10.1001/jamanetworkopen.2022.33946	journal	September 2022
How to Exploit Weaknesses in Biomedical Challenge Design and Organization Reinke, Annika; Eisenmann, Matthias; Onogur, Sinan Medical Image Computing and Computer Assisted Intervention – MICCAI 2018 https://doi.org/10.1007/978-3-030-00937-3_45	book	January 2018
TeCNO: Surgical Phase Recognition with Multi-stage Temporal Convolutional Networks Czempiel, Tobias; Paschali, Magdalini; Keicher, Matthias Medical Image Computing and Computer Assisted Intervention – MICCAI 2020 https://doi.org/10.1007/978-3-030-59716-0_33	book	January 2020
A Review of Medical Federated Learning: Applications in Oncology and Cancer Research Chowdhury, Alexander; Kassem, Hasan; Padoy, Nicolas Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries https://doi.org/10.1007/978-3-031-08999-2_1	book	January 2022
Mining Adverse Drug Reactions from Unstructured Mediums at Scale Haq, Hasham Ul; Kocaman, Veysel; Talby, David Multimodal AI in Healthcare https://doi.org/10.1007/978-3-031-14771-5_26	book	November 2022
The EU General Data Protection Regulation (GDPR): A Practical Guide Voigt, Paul; von dem Bussche, Axel https://doi.org/10.1007/978-3-319-57959-7	book	January 2017
Implementation and Benefits of a Vendor-Neutral Archive and Enterprise-Imaging Management System in an Integrated Delivery Network Sirota-Cohen, Chen; Rosipko, Beverly; Forsberg, Daniel Journal of Digital Imaging, Vol. 32, Issue 2 https://doi.org/10.1007/s10278-018-0142-3	journal	October 2018
From knowledge to action: the impact of benchmarking on organizational performance Drew, Stephen A. W. Long Range Planning, Vol. 30, Issue 3 https://doi.org/10.1016/S0024-6301(97)90262-4	journal	June 1997
Continual learning in medical devices: FDA's action plan and beyond Vokinger, Kerstin N.; Feuerriegel, Stefan; Kesselheim, Aaron S. The Lancet Digital Health, Vol. 3, Issue 6 https://doi.org/10.1016/S2589-7500(21)00076-5	journal	June 2021
Artificial intelligence for clinical oncology Kann, Benjamin H.; Hosny, Ahmed; Aerts, Hugo J. W. L. Cancer Cell, Vol. 39, Issue 7 https://doi.org/10.1016/j.ccell.2021.04.002	journal	July 2021
Using HL7 FHIR to achieve interoperability in patient health record Saripalle, Rishi; Runyan, Christopher; Russell, Mitchell Journal of Biomedical Informatics, Vol. 94 https://doi.org/10.1016/j.jbi.2019.103188	journal	June 2019
Spark NLP: Natural Language Understanding at Scale Kocaman, Veysel; Talby, David Software Impacts, Vol. 8 https://doi.org/10.1016/j.simpa.2021.100058	journal	May 2021
Accurate Clinical and Biomedical Named Entity Recognition at Scale Kocaman, Veysel; Talby, David Software Impacts, Vol. 13 https://doi.org/10.1016/j.simpa.2022.100373	journal	August 2022
Why rankings of biomedical image analysis competitions should be interpreted with care Maier-Hein, Lena; Eisenmann, Matthias; Reinke, Annika Nature Communications, Vol. 9, Issue 1 https://doi.org/10.1038/s41467-018-07619-7	journal	December 2018
Federated learning enables big data for rare cancer boundary detection Pati, Sarthak; Baid, Ujjwal; Edwards, Brandon Nature Communications, Vol. 13, Issue 1 https://doi.org/10.1038/s41467-022-33407-5	journal	December 2022
How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals Wu, Eric; Wu, Kevin; Daneshjou, Roxana Nature Medicine, Vol. 27, Issue 4 https://doi.org/10.1038/s41591-021-01312-x	journal	April 2021
Federated learning for predicting clinical outcomes in patients with COVID-19 Dayan, Ittai; Roth, Holger R.; Zhong, Aoxiao Nature Medicine, Vol. 27, Issue 10 https://doi.org/10.1038/s41591-021-01506-3	journal	September 2021
Multimodal biomedical AI Acosta, Julián N.; Falcone, Guido J.; Rajpurkar, Pranav Nature Medicine, Vol. 28, Issue 9 https://doi.org/10.1038/s41591-022-01981-2	journal	September 2022
Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer Ogier du Terrail, Jean; Leopold, Armand; Joly, Clément Nature Medicine, Vol. 29, Issue 1 https://doi.org/10.1038/s41591-022-02155-w	journal	January 2023
A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories Placido, Davide; Yuan, Bo; Hjaltelin, Jessica X. Nature Medicine, Vol. 29, Issue 5 https://doi.org/10.1038/s41591-023-02332-5	journal	May 2023
Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data Sheller, Micah J.; Edwards, Brandon; Reina, G. Anthony Scientific Reports, Vol. 10, Issue 1 https://doi.org/10.1038/s41598-020-69250-1	journal	July 2020
Prognostic factors analysis for oral cavity cancer survival in the Netherlands and Taiwan using a privacy-preserving federated infrastructure Geleijnse, Gijs; Chiang, RuRu Chun-Ju; Sieswerda, Melle Scientific Reports, Vol. 10, Issue 1 https://doi.org/10.1038/s41598-020-77476-2	journal	November 2020
The “inconvenient truth” about AI in healthcare Panch, Trishan; Mattie, Heather; Celi, Leo Anthony npj Digital Medicine, Vol. 2, Issue 1 https://doi.org/10.1038/s41746-019-0155-4	journal	August 2019
The future of digital health with federated learning Rieke, Nicola; Hancox, Jonny; Li, Wenqi npj Digital Medicine, Vol. 3, Issue 1 https://doi.org/10.1038/s41746-020-00323-1	journal	September 2020
End-to-end privacy preserving deep learning on multi-institutional medical imaging Kaissis, Georgios; Ziller, Alexander; Passerat-Palmbach, Jonathan Nature Machine Intelligence, Vol. 3, Issue 6 https://doi.org/10.1038/s42256-021-00337-8	journal	May 2021
GaNDLF: the generally nuanced deep learning framework for scalable end-to-end clinical workflows Pati, Sarthak; Thakur, Siddhesh P.; Hamamcı, İbrahim Ethem Communications Engineering, Vol. 2, Issue 1 https://doi.org/10.1038/s44172-023-00066-3	journal	May 2023
HIPAA Regulations — A New Era of Medical-Record Privacy? Annas, George J. New England Journal of Medicine, Vol. 348, Issue 15 https://doi.org/10.1056/NEJMlim035027	journal	April 2003
OpenFL: the open federated learning library Foley, Patrick; Sheller, Micah J.; Edwards, Brandon Physics in Medicine & Biology, Vol. 67, Issue 21 https://doi.org/10.1088/1361-6560/ac97d9	journal	October 2022
Patient data ownership: who owns your health? Liddell, Kathleen; Simon, David A.; Lucassen, Anneke Journal of Law and the Biosciences, Vol. 8, Issue 2 https://doi.org/10.1093/jlb/lsab023	journal	August 2021
Nimg-32. the Federated Tumor Segmentation (Fets) Initiative: the First Real-World Large-Scale Data-Private Collaboration Focusing on Neuro-Oncology Baid, Ujjwal; Pati, Sarthak; Thakur, Siddhesh Neuro-Oncology, Vol. 23, Issue Supplement_6 https://doi.org/10.1093/neuonc/noab196.532	journal	November 2021
MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance Mattson, Peter; Reddi, Vijay Janapa; Cheng, Christine IEEE Micro, Vol. 40, Issue 2 https://doi.org/10.1109/MM.2020.2974843	journal	March 2020
Dissecting racial bias in an algorithm used to manage the health of populations Obermeyer, Ziad; Powers, Brian; Vogeli, Christine Science, Vol. 366, Issue 6464 https://doi.org/10.1126/science.aax2342	journal	October 2019
Deep Models Under the GAN Hitaj, Briland; Ateniese, Giuseppe; Perez-Cruz, Fernando Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/3133956.3134012	conference	October 2017
Ethics of Using and Sharing Clinical Imaging Data for Artificial Intelligence: A Proposed Framework Larson, David B.; Magnus, David C.; Lungren, Matthew P. Radiology, Vol. 295, Issue 3 https://doi.org/10.1148/radiol.2020192536	journal	June 2020
Building Tools for Machine Learning and Artificial Intelligence in Cancer Research: Best Practices and a Case Study with the PathML Toolkit for Computational Pathology Rosenthal, Jacob; Carelli, Ryan; Omar, Mohamed Molecular Cancer Research, Vol. 20, Issue 2 https://doi.org/10.1158/1541-7786.MCR-21-0665	journal	December 2021
Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges Ellrott, Kyle; Buchanan, Alex; Creason, Allison Genome Biology, Vol. 20, Issue 1 https://doi.org/10.1186/s13059-019-1794-0	journal	September 2019
Joint Imaging Platform for Federated Clinical Data Analytics Scherer, Jonas; Nolden, Marco; Kleesiek, Jens JCO Clinical Cancer Informatics, Issue 4 https://doi.org/10.1200/CCI.20.00045	journal	November 2020
Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study Zech, John R.; Badgeley, Marcus A.; Liu, Manway PLOS Medicine, Vol. 15, Issue 11 https://doi.org/10.1371/journal.pmed.1002683	journal	November 2018
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Von Werra, Leandro; Tunstall, Lewis; Thakur, Abhishek Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations https://doi.org/10.18653/v1/2022.emnlp-demos.13	conference	January 2022
Twenty Years of Digital Pathology: An Overview of the Road Travelled, What is on the Horizon, and the Emergence of Vendor-Neutral Archives Pantanowitz, Liron; Sharma, Ashish; Carter, Alexis B. Journal of Pathology Informatics, Vol. 9, Issue 1 https://doi.org/10.4103/jpi.jpi_69_18	journal	January 2018

Similar Records

Intern-Artificial Intelligence Benchmarking

Journal Article · Mon Jan 19 19:00:00 EST 2026 · No journal information · OSTI ID:3014039

Artificial Intelligence Benchmarking

Conference · Wed Aug 06 20:00:00 EDT 2025 · No journal information · OSTI ID:3019384

Artificial Intelligence

Journal Article · Mon Sep 30 20:00:00 EDT 2019 · Geographic Information Science & Technology Body of Knowledge · OSTI ID:1607204

Related Subjects

60 APPLIED LIFE SCIENCES
97 MATHEMATICS AND COMPUTING
Cancer imaging
Diseases
Information technology
Operational research

Federated benchmarking of medical artificial intelligence with MedPerf

Citation Formats

References (42)

Similar Records

Related Subjects