New Computational Models Using Machine Learning Can Enhance Care, Reduce Inequities and Build Clinical Trust

When Shalmali Joshi completed her PhD training in machine learning (ML) at the University of Texas, she didn’t need to look far to see the impact that discipline was already making. Recommendation systems and image recognition technology were two of the many examples of how that emerging technology had gained quick popularity.

Joshi, then considering several future paths for herself, was more interested in the impact ML had not yet fully made.

“There were many interesting challenges in clinical healthcare data where I thought computational methods, especially machine learning, could help,” Joshi said. “That shifted my attention more to machine learning in the health space, and I took the time to understand the domain deeply. I wanted to see if machine learning and AI could have any impact on how we do healthcare, and I took my time to convince myself that was the case.”

Now her time will be spent sharing her ML knowledge and collaborating within the Columbia Department of Biomedical Informatics (DBMI). Joshi joined Columbia University as assistant professor of biomedical informatics in the spring of 2023 and is currently building a lab to focus on developing new computational methods to improve clinical healthcare.

“The idea is to leverage large-scale clinical data to develop next-generation, AI-enabled learning health systems,” Joshi said. “We need to work backwards and understand the technical challenges we want to address to make this technology useful for learning health systems. A lot of my work is focused on how we can make these AI-enabled learning health systems safe, robust, and how it could inform equitable health care.”

One basic technical challenge is in healthcare data collection. Around the more commercial uses of AI/ML, data just happens to be relatively more comprehensive, and we can experiment very quickly to improve the quality of the data, as well as our model. Healthcare data is very different, and until recently, had been acquired in far more primitive ways.

“It is biased data collected on a very heterogeneous population, and is often muddled so to speak with artifacts of how people are required to interact with the healthcare system,” Joshi said. “There are also a lot of data that is missing and very hard to compensate for. We are only collecting data when the person is sick and has the means to access the healthcare system for a specific purpose. If they are healthy, we don’t know much about them. Health data is collected at very different levels of granularity, like genomic data. How do you make sense of a patient’s health from such data, while also ensuring that what we have learned aren’t the pesky statistical artifacts of how people are required to use the health system? These challenges need a lot of methodological advances. That is a huge gap we are trying to fill.”

If working backwards on this issue brings you to the data, one consideration working forward is building reliable AI models that warrant the trust of those who will make clinical decisions, using that data. When ML took off for Google searches, it was exciting and new, and few people worried about trusting its efficiency — especially since queries about finding the nearest movie theater or a particular image didn’t reach the level of a life-or-death matter.

Joshi wants to improve healthcare outcomes and save lives, but medical professionals had been focused on the same goals long before this technology was a consideration.

“We have been doing really good medical research for centuries now, and I think AI has a higher bar to meet in healthcare than in other domains,” she said. “Combine that with the fact that data collected is not always of high quality, and you can understand why we have not seen that much progress with AI models. I do think we are getting very close. We are understanding that data-driven methods have to be the bread and butter of how machine learning is done, and we are going to see a lot of success with these models in healthcare.”

While improving these models can impact everybody, Joshi is especially driven to address the inequities plaguing the healthcare system. She sees possibilities in using AI to identify disparities in a way that was not previously characterizable. She also believes there are ways of connecting data sources with EHR to predict potential interventions for those who may face access challenges around healthcare or insurance.

Joshi has published several papers on her research, including Why did the Model Fail?”: Attributing Model Performance Changes to Distribution Shifts (arXiv), What went wrong and when? Instance-wise feature importance for time-series black-box models (NeurIPS ‘20), and An empirical framework for domain generalization in clinical settings (CHIL ’21: Proceedings of the Conference on Health, Inference, and Learning). After earning her PhD at the University of Texas and doing her postdoc at Harvard, she is excited to continue her research at the Columbia University Irving Medical Center.

“The biggest thing that excites me is the amazing people within the department,” she said. “DBMI is also very unique in the way it straddles academic research with operating in a close-knit manner with the hospital. You don’t find that often, and there are still a lot of barriers for a researcher who is primarily computational to be able to successfully deploy a machine learning model in another place.”

“DBMI has it all figured out,” she added, “This is an amazing place for an early-career researcher like me to focus on the research and the science and not be worried about all the other barriers that can come my way.”

The Joshi Lab is seeking a postdoctoral research scientist (who will also work with Noémie Elhadad’s lab) “with a strong background in machine learning to conduct cutting-edge research to develop new methods and a foundational understanding for generalization, transfer learning and/or domain adaptation for healthcare applications.” She is excited to build her lab at Columbia and take advantage of both the DBMI resources and connections in her pursuit of developing novel computational methods.

“I think my lab will be a mix of people who are very strong computationally in ML methods and AI methods, but also have a very deep interest in understanding the clinical problems, and issues of equity in clinical healthcare and translating back to identifying good computational solutions to fill existing gaps,” Joshi said. “Particularly, we will work on addressing two main things I want to do, AI-based learning health systems and health disparities at the point of care.”