We know how to regulate new drugs and medical devices–but we’re about to let health care AI run amok
There’s a great deal of buzz around artificial intelligence and its potential to transform industries. Healthcare ranks high in this regard. If it’s applied properly, AI will dramatically improve patient outcomes by improving early detection and diagnosis of cancer, accelerating the discovery of more efficient targeted therapies, predicting disease progression, and creating ideal personalized treatment plans.
Alongside this exciting potential lies an inconvenient truth: The data used to train medical AI models reflects built-in biases and inequities that have long plagued the U.S. health system and often lacks critical information from underrepresented communities. Left unchecked, these biases will magnify inequities and lead to lives lost due to socioeconomic status, race, ethnicity, religion, gender, disability, or sexual orientation.
Deaths will happen
To produce AI models, data scientists use algorithms that uncover or learn associations of predictive power from large data sets. In large language models (LLM) or generative AI, deep learning techniques are deployed to analyze and learn patterns from the input text data, regardless of whether that information is true, false, or simply inaccurate. This mass of data, however imperfect, is what enables the model to form coherent and relevant responses to a wide variety of queries.
In health care, differences in how patients are treated–or not treated–are embedded in the very data used for training the AI tools. When applied to a large and diverse population, this means that the medical needs of a select population–such as people of color, underrepresented communities, people with disabilities, or people with a specific type of health plan coverage–can be ignored, overlooked, or misdiagnosed. If left unchecked, people will needlessly die–and we may not even know that the underlying misinformation or untruth exists.
AI systems do not operate in isolation. Let’s take a real-world example: If the machine learning software is trained on large sets of data that include entrenched, systemic biases that lead to different care being provided to white patients than to patients of color, these data inequities are passed on to AI algorithms and exponentially magnified as the model learns and iterates. Research conducted four years before our current AI renaissance demonstrated such dire consequences for people who are already underserved. A landmark 2019 study in Science investigated an AI-based prediction algorithm used in hospitals serving more than 100 million patients–and found that Black patients had to be much sicker than white patients in order to become candidates for the same levels of care.
In this case, the underlying data used to train the AI model was flawed. So was the algorithm, which was trained on health care spending data as a proxy for health care needs. The algorithm reflected a historic disparity that Black patients, compared to white patients with the same level of needs, have less access to care and thus generated less commercial insurance claims data and less spending on health care. Using historical health care cost as a proxy for health, the AI model incorrectly concluded that Black patients were healthier than equally sick white patients, and, in turn, undercounted the number of Black patients needing additional care by more than half. When the algorithm was corrected, the portion of Black patients identified for extra care based on their medical needs increased from 18% to 47%.
Another algorithm, created to assess how many hours of in-home aid should go to severely disabled state residents, was found to have several biases, resulting in errors concerning recipients’ medical needs. As a result, the algorithm directed much-needed medical services to be cut, leading to extreme disruptions in many patients’ care and, in some cases, to hospitalizations.
The consequences of flawed algorithms can be deadly. A recent study focused on an AI-based tool to promote early detection of sepsis, an illness that kills about 270,000 people each year. The tool, deployed in more than 170 hospitals and health systems, failed to predict sepsis in 67% of patients. It generated false sepsis alerts for thousands of others. The source of the flawed detection, researchers found, was that the tool was being used in new geographies with different patient demographics than those it had been trained on. Conclusion: AI tools do not perform the same across different geographies and demographics, where patient lifestyles, incidence of disease, and access to diagnostics and treatments vary.
Particularly worrisome is the fact that AI-powered chatbots may use LLMs that rely on data not screened for accuracy of information. False information, bad advice to patients, and harmful medical outcomes can result.
We need to step up
Before AI transforms health care, the medical community needs to step up, insist on human oversight at each stage of development, and apply ethical standards to deployment.
A comprehensive, multi-dimensional approach is required when developing AI in medicine. This is not a task for data scientists only, but it also requires a deep involvement from a diverse mix of professionals– including data scientists, technologists, hospital administrators, doctors, and other medical specialists from various backgrounds and with different perspectives, all aware of the dangers of mismanaged AI– providing the oversight necessary to ensure that AI is a positive transformational tool for health care.
Just as a drug trial requires FDA oversight–with guiding principles and publicly shared data and evidence–AI stewardship in health care requires independent audits, evaluations, and scrutiny before it’s used in clinical settings. The FDA has processes to regulate medical devices but lacks dedicated funding and clear pathways to regulate new AI-based tools. This leaves AI developers on their own to develop processes that mitigate bias–if they are even aware of the need to do so. Private industry, data scientists, and the medical community must build diversity into teams developing and deploying AI. AI can and should be developed and applied to medicine as its potential is monumental–but we all need to acknowledge the complexity of medicine, especially given the entrenched biases in training data, and require a model design that takes that into account every step of the process.
As a physician, one of the first tenets that I learned in medical school is the Hippocratic Oath. I pledged to “first, do no harm.” Now, as an executive and innovator, I aim to go above and beyond. Building an infrastructure for AI to function properly in health care will move us one giant step closer to transforming health care for everyone’s benefit.
Chevon Rariy, M.D., is a Chief Health Officer and Senior Vice President of Digital Health at Oncology Care Partners, an innovative value-based oncology care network, as well as an investor and practicing endocrinologist focused on oncology. She is the co-founder of Equity in STEMM, Innovation, & AI, which collaborates with academia, industry, and policymakers to reduce barriers in healthcare and advance STEMM (Science, Technology, Engineering, Mathematics, and Medicine) in underrepresented communities. Dr. Rariy serves on various non-profit and private boards at the intersection of digital health, technology, and equity and is a JOURNEY Fellow 2023.
More must-read commentary published by Fortune:
The opinions expressed in Fortune.com commentary pieces are solely the views of their authors and do not necessarily reflect the opinions and beliefs of Fortune.