Biomedical Informatics Seminar Series

The DBMI seminar series is a 1-credit course for DBMI students who can benefit from hearing new methods of research from speakers from both academia and industry. Enrollment is restricted to DBMI students, but anybody may attend the seminars. It is currently being offered virtually, though it is traditionally held in PH20-200.

DBMI also hosts a Special Seminar Series: Toward Diversity, Equity, and Inclusion in Informatics, Health Care, and Society. Both upcoming presentations and past recordings will be shared on our Special Seminar Series homepage.


The 2023 spring semester seminars have concluded.
Please visit this page in the late summer for a schedule of our 2023 fall semester speakers.

2023 Spring Seminars

Speakers: Krystal Tsosie and Keolu Fox

Title: #DATABACK: Indigenous Genomic Data Justice for Indigenous Peoples 

Watch This Presentation

Abstract: Despite over a decade of efforts to increase diversity in genomic datasets, Indigenous peoples still constitute less than 1% of total representation. The answer, however, is not simply to recruit more Indigenous peoples because defaulting to old, problematic norms of broad consent can recreate cycles of data exploitation and extraction that benefit Indigenous peoples last. To move forward, we need to rethink data equity approaches that center principles of Indigenous genomic data sovereignty, which means employing new techniques in blockchaining and federated learning in addition to Indigenous-led bio-databanks. Hence, Drs. Tsosie and Fox are advocating for an Indigenous data justice approach that is truly responsive to genomic medicine and precision health innovation.

Bios: Krystal Tsosie, PHD, MPH, MA is an Indigenous (Diné/Navajo Nation) geneticist-bioethicist at Arizona State University’s School of Life Sciences and Center for Biology and Society. She co-founded the Native BioData Consortium, the first US Indigenous-led biobank and 501c3 nonprofit research institution. Much of her current research centers on ethical engagement with Indigenous communities in precision health through genetic epidemiology, public health, and computational approaches. She is also increasingly exploring machine learning approaches and using digital data tools to operationalize Indigenous genomic data sovereignty to foster Indigenous-led data solutions and build Tribal Nations’ capacity in technology, health, education, and local data economies.

Keolu Fox is the first Kānaka Maoli (Native Hawaiian) to receive a doctorate in genome sciences, and is an assistant professor at the University of California, San Diego, affiliated with the Department of Anthropology, the Global Health Program, the Halıcıoğlu Data Science Institute, the Climate Action Lab, the Design Lab, and the Indigenous Futures Institute. His work focuses on the connection between raw data as a resource and the emerging value of genomic health data from Indigenous communities. He has experience designing and engineering genome sequencing and editing technologies, and a decade of grassroots experience working with Indigenous partners to advance precision medicine. As an ENRICH Global Chair, Keolu will build a library for Indigenous health data in partnership with Indigenous communities. He will pilot a platform that will enable collecting and protecting Indigenous health data using Indigenous Data Sovereignty (IDS) principles, which provides a framework for allowing Indigenous communities themselves to manage and benefit from their own data. Ultimately, he hopes to create a replicable standard for Indigenous data sovereignty.

Speaker: Andrew Sellergren  

Title: Getting Started with AI for Medical Imaging: Exploring CXR Foundation from Google Health AI

Watch this presentation

Abstract: Artificial intelligence (AI) for computer vision is growing in both popularity and usefulness on a daily basis. In fields like healthcare, AI for medical imaging is poised to save millions of lives over the next few decades. But it’s still hard to know how to get started with it, particularly if you don’t have access to large computational resources or datasets. Recently, Google Health AI published research and released a tool called CXR Foundation that enable you to get started with creating machine learning models for disease detection on chest x-rays (CXRs), of which there are over a billion taken every year. With CXR Foundation, we showed that it’s possible to train models that can diagnose tuberculosis or predict the severity of COVID-19 using only a few hundred images.

How does CXR Foundation work? In this tech talk, we’ll explain the research we did to create the model and walk through code to train your own!

Bio: Andrew is a software engineer on Google Health. He studied chemistry in college, joined Google as an analyst in 2010, and transferred into software engineering in 2014. He worked on large-scale infrastructure for Google Fit and Google Surveys before joining Google Health in 2019. Since then, he has worked on deep learning for chest x-rays as well as externalizing training pipelines.

Speaker: Lauren Wilcox, Responsible AI & Human-Centered Technology in Google Research

Title: Participatory Approaches to Health AI  

Per the speaker’s request, this session was not recorded.

Abstract: Advances in computing technology continue to offer us new insights about our health. As mutually reinforcing trends make the use of wearable and mobile devices routine, we now collect personal, health-related data at an unprecedented scale. Meanwhile, the use of deep-learning-based health screening technologies changes relationships between caregivers and care recipients, with multitudinous implications for equity, privacy, safety, and trust. How can researchers take inclusive and responsible approaches to envisioning solutions, training data, and deploying ML/AI-driven solutions? Who should be involved in decisions about how to use ML/AI in digital health and well-being solutions, and even what solutions matter in the first place? 

In this talk, I will discuss participatory approaches to designing digital health and well-being technologies with impacted communities. Starting with field studies in clinics exploring how people navigated use of a deployed, diagnostic AI system, and moving onto lessons learned from an international study of how people with marginalized health needs navigate aspects of their health care, I will highlight the importance of taking participatory approaches to technology design, development, and evaluation.

Bio: Lauren Wilcox, PhD, is a Senior Staff Research Scientist and Group Manager in Responsible AI and Human-Centered Computing in Google Research. Her work builds on many years of experience conducting human-centered computing research in service of human health and well-being. Previously at Google Health, Wilcox led initiatives to align AI advancements in healthcare with the needs of clinicians, patients, and their family members. She also holds an Adjunct Associate Professor position in Georgia Tech’s School of Interactive Computing where she was a tenured associate professor. Wilcox was an inaugural member of the ACM Future of Computing Academy. She frequently serves on the organizing and technical program committees for premier conferences in the field. Wilcox received her PhD in Computer Science from Columbia University in 2013.

Speakers: Pooja Desai and Tara Anand, PhD Students, Columbia DBMI

Titles: Towards Human-Centered Informatics Tools for Nutrition Management (Desai) | Cluster DAGs for causal effect identification in high-dimensional domains (Anand)

Student sessions are not recorded.

Abstract (Desai): Recent years have seen an explosion of informatics technologies to support nutrition management. However, people often struggle to translate algorithmic insights into concrete actions. Personalized recommendation systems (RecSys) can help users identify specific actions to improve health. However, they seldom account for the key role users’ nutrition goals and decision context has on decision-making. Informed by Human-Centered AI paradigms, we explore approaches to generate nutrition recommendations using crowdsourced free-text meal descriptions and to communicate nutrition guidance to support action. First, we developed a new approach to using meal similarity from free-text meal data to generate nutrition recommendations. Second, we explored how nutrition guidance should be conveyed to users to support action. We discuss opportunities for future work to integrating these computational and interaction components towards developing more human-centered nutrition recommendations that align with users’ health goals, existing practices, and preferences.

Abstract (Anand): Determining the effect of an intervention from observational data is a task of interest in informatics and can be accomplished using various causal inference techniques. Assumptions are necessary to perform causal inferences and are often articulated through graphical models known as causal diagrams, which represent an abstracted form of the functional models causally relating variables in a system. In causal diagrams, the nodes are observed or unobserved variables and the edges are causal relationships between these variables. Causal diagrams are sufficient to generate, in many cases, an expression of probabilities estimated from a dataset to precisely yield an unbiased causal effect, through a task known as identification. One difficulty is the significant knowledge required to articulate a causal diagram, as construction necessitates knowledge of causal relationships between all pairs of relevant variables in the dataset. In high-dimensional and complex settings such as medicine, fully specifying a causal diagram may be infeasible or impossible due to lack of clinical knowledge over a great number of variables. To address the complexity of medicine, while still allowing for knowledge to inform causal inferences, we introduce cluster directed acyclic graphs (C-DAGs), which allow for the grouping of nodes. Causal relationships between variables within a cluster can be left unspecified such that significantly less knowledge is required to inform how groups or clusters of variables are causally related. We define and characterize this novel class of graphs, describe how such a graph can be constructed from partial knowledge in a medical context, and prove the soundness and completeness of tools allowing for inference of causal effects over this graphical object. Specifically, we formalize the methodology for how C-DAGs can be used to support inferences of three kinds: associational, interventional, and counterfactual, with a focus on the latter two types which are causal in nature.

Title: Causal Inference and Data Fusion 

Presenter: Elias Bareinboim, Associate Professor in the Department of Computer Science/Director of the Causal Artificial Intelligence Lab, Columbia University

Per the speakers request, this session is not available.

Abstract: Causal inference is usually dichotomized into two categories, experimental (Fisher, Cox, Cochran) and observational (Neyman, Rubin, Robins, Dawid, Pearl) which, by and large, are studied separately. Understanding reality is more demanding. Experimental and observational studies are but two extremes of a rich spectrum of research designs that generate the bulk of the data available in practical, large-scale situations. In typical medical explorations, for example, data from multiple observations and experiments are collected, coming from distinct experimental setups, different sampling conditions, and heterogeneous populations.

In this talk, I will introduce the data-fusion problem, which is concerned with piecing together multiple datasets collected under heterogeneous conditions (to be defined) so as to obtain valid answers to queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to causal analysts since the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. I will present my work on a general, non-parametric framework for handling these biases and, ultimately, a theoretical solution to the problem of fusion in causal inference tasks.

Bio: Elias Bareinboim is the director of the Causal Artificial Intelligence (CausalAI) Laboratory and an associate professor in the Department of Computer Science at Columbia University. His research focuses on causal and counterfactual inference and their applications to data-driven fields in the health and social sciences as well as artificial intelligence and machine learning. His work was the first to propose a general solution to the problem of “data-fusion,” providing practical methods for combining datasets generated under different experimental conditions and plagued with various biases. More recently, Bareinboim has been exploring the intersection of causal inference with decision-making (including reinforcement learning) and explainability (including fairness analysis). Bareinboim received his Ph.D. from the University of California, Los Angeles, where he was advised by Judea Pearl. Bareinboim was named one of “AI’s 10 to Watch” by IEEE, and is a recipient of an NSF CAREER Award, the Dan David Prize Scholarship, the 2014 AAAI Outstanding Paper Award, and the 2018 UAI Best Student Paper Award.

Title: Finding Cardiac Disease with CRADLE: the Cardiovascular and Radiologic Deep Learning Environment 

Presenter: Pierre Elias, Assistant Professor of Medicine (in Biomedical Informatics), Columbia University

Watch This Presentation

Abstract: In this talk we will discuss why and how deep learning approaches have the potential to greatly impact cardiac imaging. We will then explore use cases developed here at Columbia that have led to two of the world’s first prospective clinical trials of deep learning in cardiology. Lastly we’ll critique the limitations of current ML approaches preventing mainstream adoption in order to answer the question, “What are the big problems the field needs to be tackling now?” (and maybe even answer, “What’s an interesting research idea for me to pursue as a grad student?”)

Bio: Pierre Elias is an Assistant Professor in the Division of Cardiology and the Department of Biomedical Informatics at Columbia University Irving Medical Center, where he practices as a general cardiologist. He is also the Medical Director for Artificial Intelligence at NewYork-Presbyterian. His research lab develops machine learning technologies for medical imaging to improve the detection and management of cardiovascular disease.

Title: Designing Technologies to Support Patients as Safeguards

Speaker: Dr. Wanda Pratt, Associate Professor and Dean, Information School

Watch This Presentation

Abstract: Recent studies indicate that medical errors are a leading cause of death in the United States. Although this problem has received substantial national attention, little work has actively involved patients in preventing, detecting, and recovering from these errors. In this presentation, I will detail our efforts to design new technologies with patients and their caregivers to support them in safeguarding their own health in the hospital. Currently, patients receive inadequate information and support to play such a safeguarding role. Using human-centered, mixed-methods approaches, we have assessed the information needs of hospitalized patients, created new technologies, and learned insights for how to address those needs. These insights will support the health-care community in engaging patients as safeguards against medical errors and provide a vision for enhancing the overall patient experience using information technology.

Bio: Dr. Wanda Pratt is a Professor and the Associate Dean for Inclusion, Diversity, Equity, Access, and Sovereignty (IDEAS) in the Information School with an adjunct appointment in Biomedical Informatics & Medical Education in the Medical School at the University of Washington. She received her Ph.D. in Medical Informatics from Stanford University, her M.S. in Computer Science from the University of Texas, and her B.S. in Electrical Engineering from the University of Kansas. Her research focuses on both understanding the work people do to manage their health as well as designing new technologies to support that work and reduce its burden. She has worked with hospitalized patients as well as people coping with a variety of chronic diseases, such as cancer, diabetes, asthma, and heart disease. Her recent work focuses on support for people from historically marginalized or underestimated communities. Dr. Pratt has received best paper awards from the American Medical Informatics Association (AMIA), the ACM CHI Conference on Human Factors in Computing Systems, the ACM Conference on Computer-Supported Cooperative Work (CSCW), and the Journal of the American Society of Information Science & Technology (JASIS&T). Her research has been funded by the National Science Foundation, the National Institutes of Health, the Agency for Healthcare Research & Quality, the Robert Wood Johnson Foundation, Intel, Google, and Microsoft. Dr. Pratt is a fellow of the American College of Medical Informatics. She recently served two terms on the Board of Directors for AMIA and Chaired their 2016 Annual Symposium.

Title: Transforming the Health of Communities through Innovations in Social Computing

Speaker: Dr. Andrea Grimes Parker, Associate Professor at Georgia Tech 

Watch This Presentation

Abstract: Digital health research—the investigation of how technology can be designed to support wellbeing—has exploded in recent years. Much of this innovation has stemmed from advances in the fields of human-computer interaction and artificial intelligence. A growing segment of this work is examining how information and communication technologies (ICTs) can be used to achieve health equity, that is, fair opportunities for all people to live a healthy life. Such advances are sorely needed, as there exist large disparities in morbidity and mortality across population groups. These disparities are due in large part to social determinants of health, that is, social, physical, and economic conditions that disproportionately inhibit wellbeing in populations such as low-socioeconomic status and racial and ethnic minority groups. 

Despite years of digital health research and commercial innovation, profound health disparities persist. In this talk, I will argue that to reduce health disparities, ICTs must address social determinants of health. Intelligent interfaces have much to offer in this regard, and yet their affordances—such as the ability to deliver personalized health interventions—can also act as pitfalls. For example, a focus on personalized health interventions can lead to the design of interfaces that help individuals engage in behavioral change. While such innovations are important, to achieve health equity there is also a need for complimentary systems that address social relationships. Social ties are a crucial point of focus for digital health research as they can provide meaningful supports for positive health, especially in populations that disproportionately experience barriers to wellbeing

I will offer a vision for digital health equity research in which interactive and intelligent systems are designed to help people build, enrich, and engage social relationships that support wellbeing. By expanding the focus from individual to social change, there is tremendous opportunity to create disruptive interventions that catalyze and sustain population health improvements.

Bio: Andrea Grimes Parker is an Associate Professor in the School of Interactive Computing at Georgia Tech. She is also an Adjunct Associate Professor in the Rollins School of Public Health at Emory University and at Morehouse School of Medicine. Dr. Parker holds a Ph.D. in Human-Centered Computing from Georgia Tech and a B.S. in Computer Science from Northeastern University. She is the founder and director of the Wellness Technology Lab at Georgia Tech. Her interdisciplinary research spans the domains of human-computer interaction and public health, as she examines how social and interactive computing systems can be designed to address health inequities. 

Dr. Parker has published widely in the space of digital health equity and received several best paper honorable mention awards for her research. Her research has been funded through awards from the National Science Foundation, the National Institutes of Health, the Aetna Foundation, Google, and Johnson & Johnson. Additionally, she is a recipient of the 2020 Georgia Clinical & Translational Science Alliance Team Science Award. Dr. Parker has held various leadership roles, including serving as co-chair for Workgroup on Interactive Systems in Healthcare (WISH) and as a member of the Johnson & Johnson / Morehouse School of Medicine Georgia Maternal Health Research for Action Steering Committee.

Title: Towards building trustworthy AI systems in Medicine – research and experiences in the EU context

Speaker: Riccardo Bellazzi, Professor of Bioengineering and Biomedical Informatics, University of Pavia 

Watch This Presentation

Abstract: The recent impetuous advent of AI-based solutions in medicine is showing the need of defining a realistic roadmap for the implementation of “trustworthy” AI systems, lawful, ethical and robust. This talk will describe some European projects working along that direction and will then focus on the reliability principle, as a key component to provide the basis for the design and implementation of successful AI solutions.

Bio: Riccardo Bellazzi is Full Professor of Bioengineering and Biomedical Informatics at the University of Pavia. He is the Director of the Department of Electrical, Computer and Biomedical Engineering of the University of Pavia. Moreover, he leads the Laboratory of biomedical informatics at the hospital “Salvatore Maugeri” in Pavia. 

Title: AMIA Biomedical Informatics Year in Review

Speaker: James Cimino, Professor, University of Alabama at Birmingham; Adjunct Professor of Biomedical Informatics, Columbia University 

Watch This Presentation Here
View The Slidedeck

Abstract: What are the most significant and exciting scientific developments in biomedical informatics over the past year? The Working Groups of the American Medical Informatics Association (AMIA) provided papers in their respective domains (over 90 in total) representing the most influential or significant work published from September 2021 through September 2022. Summaries of these papers will be presented, with a focus on those with the greatest impact, broadest interest, and entertainment value in this 60-minute, multi-media event. This presentation will focus on clinical informatics, although some developments in bioinformatics and clinical research informatics that have much to offer to domains such as clinical medicine and public health will be included.

Bio: Dr. James Cimino is a board certified internist who completed a NLM informatics fellowship at the Massachusetts General Hospital and Harvard University and then went on to an academic position at Columbia University College of Physicians and Surgeons and the Presbyterian Hospital in New York. He spent 20 years at Columbia, carrying out clinical informatics research including desiderata for controlled terminologies, mobile and Web-based clinical information systems for clinicians and patients, and a context-aware form of clinical decision support called “infobuttons”. In 2008, he moved to the National Institutes of Health, where he was the Chief of the Laboratory for Informatics Development and a Tenured Investigator at the NIH Clinical Center and the National Library of Medicine. In 2015, he left NIH to be the inaugural Director of the Informatics Institute at the University of Alabama at Birmingham. He continues to conduct research in clinical informatics and clinical research informatics, he has been director of the NLM’s week-long Biomedical Informatics course for 16 years, and teaches at Columbia University and Georgetown University as an Adjunct Professor. He is co-editor (with Edward Shortliffe) of a leading textbook on Biomedical Informatics and is an Associate Editor of the Journal of Biomedical Informatics. His honors include Fellowships of the American College of Physicians, the New York Academy of Medicine and the American College of Medical Informatics (Past President), the Priscilla Mayden Award from the University of Utah, the Donald A.B. Lindberg Award for Innovation in Informatics and the President’s Award, both from the American Medical Informatics Association (AMIA), the Medal of Honor from New York Medical College, the NIH Clinical Center Director’s Award (twice), and induction into the National Academy of Medicine (formerly the Institute of Medicine). In 2019, he received the prestigious Morris F. Collen Award of Excellence from AMIA.

Title: Big Data and Wearables for Managing Health 

Speaker: Michael Snyder, Ascherman Professor and Chair of Genetics and the Director of the Center of Genomics and Personalized Medicine, Stanford University

Watch This Presentation Here

Bio: Michael Snyder is the Stanford Ascherman Professor and Chair of Genetics and the Director of the Center of Genomics and Personalized Medicine. He received his Ph.D. training at the California Institute of Technology and carried out postdoctoral training at Stanford University. Dr. Snyder has pioneered the use of “big data” and multiomics to advance scientific discovery and transform healthcare. His laboratory has invented many technologies that are widely used in medicine and research, including methods for characterizing genomes and their products (e.g. RNA-Seq, NGS paired end sequencing, ChIP-Chip and later Chip-Seq, protein arrays, machine learning for disease gene discovery). His application of omics and wearables technologies to perform longitudinal profiling of people when they are healthy and ill is transforming medicine and healthcare. Indeed, his laboratory’s recent work to use smartwatches and wearables to detect illness, including infectious disease such as COVID-19, prior to symptom onset is being used by many thousands of people. He has helped colead many large scale projects including ENCODE, HMP, HuBMAP and HTAN. He has cofounded 16 biotechnologies companies, including Personalis, Qbio, January AI, Filtricine and RTHM.

Title: Disability accessibility and fairness in Artificial Intelligence (AI) 

Speaker: Cynthia Bennett, PhD, Senior Research Scientist at Google’s People + AI Research Group Watch this presentation

Abstract: Artificial intelligence (AI) promises to automate and scale solutions to perennial accessibility challenges (e.g., generating image descriptions for blind users). However, research shows that AI-bias disproportionately impacts people already marginalized based on their race, gender, or disabilities, raising questions about potential impacts in addition to AI’s promise. In this talk I will overview broad concerns at the intersection of AI, disability, and accessibility. I will then share details about one project in this research space that led to guidance on human and AI-generated image descriptions that account for subjective and potentially sensitive descriptors around race, gender, and disability of people in images. 

Bio: Dr. Cynthia Bennett is a Senior Research Scientist in Google’s Responsible AI and Human-Centered Technology organization. Her research concerns the intersection of AI ethics and disability. Bennett is regularly invited to speak; recent hosts include Stanford and Apple. Previously, Bennett has worked at Carnegie Mellon University, Apple, and the University of Washington. Her work has received grant funding from Microsoft Research and the National Science Foundation, and eight of her peer reviewed publications have received awards. Bennett is a disabled woman scholar working in the tech and academic sectors, and she does raising participation service. Bennett’s website is, and her Twitter handle is @clb5590.

Title: Leveraging human brain connectomes to derive quantitative biomarkers for mood and anxiety disorders: methodological advances within the Human Connectome Project for Disordered Emotional States 

Presenter: Dr. Leonardo Tozzi, MD, PhD, Research Engineer at Stanford University

Watch this presentation

Abstract: Mood and anxiety disorders affect over 400 million people globally and are the leading cause of disability worldwide. The goal of the Human Connectome Project for Disordered Emotional States is to study the structure and function of large scale human brain circuits underpinning these disorders. Our study is grounded in the Research Domain Criteria (RDoC) framework developed by the National Institute of Mental Health, which hypothesizes relations among neural circuits, behavior and self-reported symptoms. In our project, we focus particularly on deriving “human connectomes” from whole-brain magnetic resonance imaging recordings, i.e. representations of the functional connections between all regions of the human brain. During this talk, I will introduce the rationale and protocol of our Human Connectome Project for Disordered Emotional States and then present the results of two methodological studies conducted within it. In the first study, we identified the portions of the human connectome that can be measured most reliably and we determined how analysis choices impact human connectome reliability. In the second study, we developed a new algorithm to link human connectomes and symptoms of disordered emotional states, named “group regularized canonical correlation analysis”. Our algorithm can handle thousands of features efficiently and take into account the correlational structure of human connectomes, thus outperforming existing tools for this application.

Bio: Leonardo Tozzi, M.D., Ph.D., graduated as a Medical Doctor from Pisa University and Sant’Anna School of Advanced Studies in 2013. In 2018, he was awarded his Ph.D. from Trinity College Dublin for his research on the impact of genetics, epigenetics and environmental stressors on structural and functional brain changes related to depression. Leonardo joined Stanford University in 2018 as the post-doctoral lead of the Human Connectome Project for Disordered Emotional States. Since 2022, he leads the Computational Neuroscience & Neuroimaging Program at the Stanford Center for Precision Mental Health and Wellness. The goal of Leonardo’s research is to develop quantitative biomarkers for mood disorders that are reliable, interpretable and can be used to guide treatment selection and estimate treatment response. To this end, he integrates large scale recordings of brain structure and function with behavioral measures and symptoms as well as other biological markers.

2022 Fall Seminars

Title: Use of Recommended Evaluation for Surgery in Patients with Drug-Resistant Epilepsy Per the presenter’s request, this session was not recorded.

Abstract: Surgery is a vastly underutilized treatment option for patients with drug-resistant epilepsy. Limited data suggest underutilization of surgery is due to physician and patient misperceptions, cost and complexity of the presurgical evaluation, and disparities in access to care. However, there are few longitudinal, population-based studies characterizing barriers to evaluation and few intervention have successfully modified practice patterns. Using the Observational Medical Outcomes Partnership Common Data Model, we developed computable phenotypes to identify patients who meet clinical criteria for drug-resistant epilepsy. We then determined the rate of surgical evaluation among patients with drug-resistant epilepsy in multiple observational databases and assessed the association of demographic and clinical factors with evaluation. Findings will provide new information about addressable barriers to epilepsy surgery, support the user-centered design of clinical decision support interventions, and provide a roadmap to promote best practices and reduce disparities for other complex and refractory conditions. This work will also establish methodology for future multi-institutional studies of epilepsy and drug-resistance using observational data.

Bio: Dr. Brett Youngerman is an Assistant Professor of Neurological Surgery at Columbia University Irving Medical Center / New York-Presbyterian Hospital specializing in epilepsy, movement disorders, and neuro-oncology. His research activities focus on the use of information technology to measure surgical treatment variability and outcomes, and promote best practices through multi-center research, care pathways and clinical decision support. His current focus is studying variable treatment pathways around epilepsy surgery with the goal of better understanding its underutilization and developing informatics-based interventions. He completed a Master of Science in Patient Oriented Research, is a KL2 Award recipient, and received a Young Investigator Award from the American Epilepsy Society.

Title: Standardizing the Unstandardizable: The Case of Sex and Gender 

Watch This Presentation

Abstract: In 2015, notice number NOT-OD-15-102 was released by the National Institutes of Health. The notice specified “consideration of sex as a biological variable” (SABV), requiring submission of information regarding this new construct from 2016 onward. However, despite this imperative explicitly citing enhancement of reproducibility, it did not lay out any conceptualization of what SABV meant, in non-human animal or human contexts, and it relied heavily on binarist and gender essentialist assumptions, which have ultimately confused the situation further. This confusion has led to SABV being co-opted by transphobic and intersexphobic organizations and individuals, while not necessarily impacting reproducibility. Why are sex and gender such complicated variables to consider? How did these constructs come to exist within the purview of scientific analysis? And what work is being done to untangle the current situation? This talk will aim to discuss these questions, while also considering the deeper ideologies underlying current scientific research and sociopolitical agendas, and how they affect effective modeling of sex and gender constructs in informatics and beyond.

Bio: Clair Kronk (she/her) is a postdoctoral fellow at the transitioning Yale Center for Medical Informatics (YCMI). She is the creator and sole author of the first LGBTQIA+ controlled vocabulary for usage in health care settings, the Gender, Sex, and Sexual Orientation (GSSO) ontology, which contains information on over 15,000 terms. Dr. Kronk has provided valuable input on GSSO standards for a number of organizations, including the Health Level 7 (HL7) Gender Harmony Project (GHP), the Systematized Nomenclature of Medicine (SNOMED), Canada Health Infoway (CHI), the International Organization for Standardization (ISO), Queensland Health, the National Academies of Sciences, Engineering, and Medicine (NASEM), the United States Core Data for Interoperability (USCDI), the World Health Organization (WHO), the Trans Metadata Collective (TMDC), the Homosaurus, Wikidata, and the American Medical Informatics Association (AMIA) Diversity, Equity, and Inclusion (DEI) Task Force.

Title: Algorithmic bias and data platforms  

Watch This Presentation

Abstract: We’re increasingly aware of the many ways that algorithms can encode and scale up racial bias. When designed with careful attention to label choice, algorithms can also be used to counter biases present in the health care system and ingrained in medical knowledge. To do so effectively, researchers and product developers must have access to platforms on which they can access health data for the benefit of patients and society. 

Bio: Ziad trained as an emergency doctor – and he still gets away as often as he can, to a hospital in rural Arizona, to work in the ER. But these days, he spends most of his time on research and teaching at Berkeley. Inspired by his clinical practice, he builds machine learning algorithms that help doctors make better decisions. He also studies where algorithms can go wrong, and how to fix them: his work on algorithmic bias has been highly influential both in public debate about algorithms, and in regulatory oversight and civil investigations. He is a Chan Zuckerberg Biohub Investigator, a Faculty Research Fellow at the National Bureau of Economic Research, and has been named an emerging leader by the National Academy of Medicine. His work has won numerous awards, and appeared in a wide range of journals (Science, Nature Medicine, the New England Journal of Medicine, leading computer science conferences). He is a co-founder of Nightingale Open Science, a non-profit that makes massive new medical imaging datasets available for research, and Dandelion, a platform for AI innovation in health. Before coming to Berkeley, he was an Assistant Professor at Harvard Medical School and a consultant at McKinsey & Co.

Title: Demonstrating reliability of real-world evidence: Validation of OHDSI’s LEGEND Hypertension study 

Watch This Presentation

Abstract: Randomized clinical trials are the mainstay for the evidence that drives hypertension clinical guidelines that recommend pharmacologic treatment based on the comparative safety and effectiveness. However, for most drug ingredients, direct head-to-head RCT evidence vs. alternative treatments do not exist, thereby requiring indirect comparisons of trials or expert opinion to form the basis for clinical decision-making. Real-world evidence generated from retrospective analysis of observational data captured during routine clinical care, such as insurance claims and electronic health records, offer the potential to supplement RCTs and fill evidence gaps where no such trials exist, but concerns about the validity of observational research has limited its adoption for guideline development. We conducted the LEGEND study to produce real-world evidence about the comparative safety and effectiveness of the 29 recommended first-line antihypertensive drug ingredients and 28 potential secondary agents listed in the ACC clinical guideline. We analyzed a network of observational databases in US, Europe and Asia to produce relative risk estimates for cardiovascular benefits and known adverse events for each pairwise comparison. In this talk, we will discuss validation of the LEGEND real-world evidence base and how comparisons with RCTs can increase confidence in results and create opportunities for real-world evidence to meaningfully inform clinical care.

Bio: Patrick Ryan, PhD is Vice President, Observational Health Data Analytics at Janssen Research and Development, where he is leading efforts to develop and apply analysis methods to better understand the real-world effects of medical products. He is an original collaborator in Observational Health Data Sciences and Informatics (OHDSI), a multi-stakeholder, interdisciplinary collaborative to create open-source solutions that bring out the value of observational health data through large-scale analytics. He served as a principal investigator of the Observational Medical Outcomes Partnership (OMOP), a public-private partnership chaired by the Food and Drug Administration, where he led methodological research to assess the appropriate use of observational health care data to identify and evaluate drug safety issues. Patrick received his undergraduate degrees in Computer Science and Operations Research at Cornell University, his Master of Engineering in Operations Research and Industrial Engineering at Cornell, and his PhD in Pharmaceutical Outcomes and Policy from University of North Carolina at Chapel Hill. Patrick has worked in various positions within the pharmaceutical industry at Pfizer and GlaxoSmithKline, and also in academia at the University of Arizona Arthritis Center.

Title: Advancing Health Equity through the use of Data 

At the presenter’s request, this session was not recorded.

Bio: Julia Iyasere, M.D., is the Executive Director of the Dalio Center for Health Justice at NewYork- Presbyterian. In this role, she leads the Center’s efforts to address longstanding health inequities due to race, socio-economic differences, limited access to care, and other complex factors that impact the wellbeing of our communities. Dr. Iyasere attended Yale University for her B.S. in Biology and Columbia University for her M.D./M.B.A. After completing her residency in Internal Medicine at Columbia, Dr. Iyasere joined the Division of General Medicine at Columbia in 2012. Prior to her current role, Dr. Iyasere was the Associate Chief Medical Officer for Service Lines and the Co-Director of the Care Team Office at NYP. An Assistant Professor of Medicine, Dr. Iyasere continues to see patients as an internist in the Section for Hospital Medicine at Columbia.

Title: Data-aware modeling and integration in genomics and biomedicine

Watch The Presentation Here

Abstract: Data integration has become crucial to understanding diseases, given the large-scale efforts to collect different measurements in genomics and biomedicine. For example, researchers have identified many factors governing gene expression and collected various related datasets. However, how these “parts” are pieced together to function as a whole remains unclear. Answering these questions requires effective data integration frameworks that explicitly model the underlying structures and relationships in the data. Our research aims to develop and apply data-aware deep learning models to genomics and clinical datasets to put together the pieces from the data effectively. In this talk, I will first cover our work using graph-based deep learning architecture that captures the underlying 3D organization of the DNA to integrate different genomic signals and connect them to gene expression via the prediction task. We also interpret the prediction results and tie them back to contributing factors to develop potential hypotheses related to gene regulation. Next, I will present our attention-based deep learning framework that learns the connections between different clinical information (genetic screenings, MRIs, patient data) from Alzheimer’s patients to predict the diagnoses accurately. This talk aims to motivate the need for data-aware integration strategies that can improve predictions and our ability to gain insights from the data in genomics and clinical domains.

Bio: Ritambhara Singh is an Assistant Professor in the Computer Science department and a faculty member of the Center for Computational Molecular Biology at Brown University. Her research lab works at the intersection of machine learning, biology, and health. Prior to joining Brown, Singh was a post-doctoral researcher in the Noble Lab at the University of Washington. She completed her Ph.D. in 2018 from the University of Virginia with Dr. Yanjun Qi as her advisor. Her research has involved developing machine learning algorithms for the analysis of biological data as well as applying deep learning models to novel biological and biomedical applications. She received the 2021 NHGRI Genomic Innovator Award for developing deep learning methods to integrate and model genomics datasets. Lab website:

Title: The Data Analysis and Real World Interrogation Network (DARWIN EU). Leveraging the OMOP CDM to leverage Real World Data for Regulatory Purposes in Europe.

Watch This Presentation

Abstract: The European Medicines Agency (EMA) has recently granted a 5-year tender to the Erasmus Medical Centre University Medical Informatics department to set up the DARWIN EU Coordination Centre. DARWIN EU will conduct hundreds of studies including up to 40 data sources from all over Europe mapped to the OMOP Common Data Model. We will discuss the governance, set up, current status, and plans for the generation of actionable RWE evidence by DARWIN EU in the coming years.

Bio: Professor Dani Prieto-Alhambra is the Section Head of Health Data Sciences at the Botnar Research Centre, University of Oxford, and Professor of Real World Evidence at Erasmus Medical Centre Rotterdam. He is the Research Coordinator for the EHDEN project, and Deputy Director for the DARWIN EU Coordination Centre. Dani has published over 320 Pubmed-indexed manuscripts including in Lancet, BMJ, JAMA, or JAMA Int Med. He has an h-index of 57.

Meeting ID: 981 0245 9573 
Passcode: 495614

Title: Multimodal deep learning for protein engineering

Watch This Presentation

Abstract: Engineered proteins play increasingly essential roles in industries and applications spanning pharmaceuticals, agriculture, specialty chemicals, and fuel. Machine learning could enable an unprecedented level of control in protein engineering for therapeutic and industrial applications. Large self-supervised models pretrained on millions of protein sequences have recently gained popularity in generating embeddings of protein sequences for protein property prediction. However, protein datasets contain information in addition to sequence that can improve model performance. This talk will cover pretrained models that use both sequences, structures, and annotations to predict protein function or to generate functional protein sequences.

Bio:Kevin Yang is a senior researcher at Microsoft Research in Cambridge, MA who works on problems at the intersection of machine learning and biology. He did his PhD at Caltech with Frances Arnold on applying machine learning to protein engineering. Before joining MSR, he was a machine learning scientist at Generate Biomedicines, where he used machine learning to optimize proteins. Before graduate school, Kevin taught math and physics for three years at a high school in Inglewood, California through Teach for America.

Title: Using Machine Learning to Increase Equity in Healthcare and Public Health

Watch This Presentation

Abstract: Our society remains profoundly unequal. Worse, there is abundant evidence that algorithms can, improperly applied, exacerbate inequality in healthcare and other domains. This talk pursues a more optimistic counterpoint — that data science and machine learning can also be used to illuminate and reduce inequality in healthcare and public health — by presenting vignettes about women’s health, COVID-19, and pain.

Bio: Emma Pierson is an assistant professor of computer science at the Jacobs Technion-Cornell Institute at Cornell Tech and the Technion, and a computer science field member at Cornell University. She holds a secondary joint appointment as an Assistant Professor of Population Health Sciences at Weill Cornell Medical College. She develops data science and machine learning methods to study inequality and healthcare. Her work has been recognized by best paper, poster, and talk awards, an NSF CAREER award, a Rhodes Scholarship, Hertz Fellowship, Rising Star in EECS, MIT Technology Review 35 Innovators Under 35, and Forbes 30 Under 30 in Science. Her research has been published at venues including ICML, KDD, WWW, Nature, and Nature Medicine, and she has also written for The New York Times, FiveThirtyEight, Wired, and various other publications.

Title: Does Social Media Support or Worsen Mental Well-Being? Well, It Depends 

Watch This Presentation

Abstract: Social media platforms continue to shape our identities, accruing important roles in our lives as they pertain to connecting with loved ones, finding like-minded peers, or finding an outlet to vent and broadcast small and big happenings around us. Much has been written in the media about these uses, but importantly, about the impacts of social media on a variety of outcomes, ranging from issues of political polarization to social justice. Is social media good or bad when it comes to mental well-being? This talk will present some critical evidence towards answering this question through a series of interlinked studies. In a first study, a large-scale observational study will situate how social support received online can help to reduce suicidal thoughts. Turning to negative impacts, a second study, using a computational causal approach, will describe the alarming ways misinformation on social media can aggravate stress and anxiety. Beyond these examples, finally, I will discuss how, eventually, in many cases, the answer to this question simply depends on the context. Specifically, anchoring on two studies that adopt a human-centered mixed methods approach, I will highlight the potential benefits and risks of social media use related to substance misuse disclosures, and to patients’ social reintegration efforts following a major psychiatric episode. Ultimately, regardless of the specific platforms, online social technologies are here to stay, and I will conclude by reflecting on possible implications that harness the positive uses and those that seek to mitigate the harmful effects of social media on mental well-being.

Bio: Munmun De Choudhury is an Associate Professor of Interactive Computing at Georgia Tech. Dr. De Choudhury is best known for laying the foundation of a new line of research that develops computational techniques towards understanding and improving mental health outcomes, through ethical analysis of social media data. To do this work, she adopts a highly interdisciplinary approach, combining social computing, machine learning, and natural language analysis with insights and theories from the social, behavioral, and health sciences. Dr. De Choudhury has been recognized with the Web Science Trust’s 2022 Test of Time Award, 2021 ACM-W Rising Star Award, 2019 Complex Systems Society – Junior Scientific Award, numerous best paper and honorable mention awards from the ACM and AAAI, and features and coverage in popular press like the New York Times, the NPR, and the BBC. Dr. De Choudhury currently serves on the Board of Directors of the International Society for Computational Social Science and on the Steering Committee of the International Conference on Web and Social Media, the leading conference on interdisciplinary studies of social media. Earlier, Dr. De Choudhury was a postdoc at Microsoft Research and obtained her PhD in Computer Science from Arizona State University.

Title: An algorithmic safety view of learning in health 

This session was not recorded.

Abstract: Machine Learning advances have revolutionized many domains such as machine translation, complex game playing, and scientific discovery. On the other hand, ML has only enjoyed modest successes in health. To improve the utility, reliability, and robustness of Machine Learning (ML) models in health and medicine, we need to address several foundational challenges. In this talk, I will demonstrate how an algorithmic-safety perspective can motivate specific technical challenges for learning in healthcare. Specifically, I will discuss the need to improve the utility of ML-robustness, explainability with an emphasis on decision-making, and post-hoc algorithmic safety to prevent harm. I will discuss my contributions on i) aiding safe decision-making in non-IID settings using time-series explainability intended to address clinicians’ requirements, ii) novel learning algorithms to optimize for safety in sequential decision-making settings, and iii) methods to improve causal robustness of ML methods designed for practical generative settings. I will conclude with an overview of a research vision on novel safety-based objectives in ML for health, expanding ML-based solutions to practical generative settings, and outlining novel ways of validating ML models targeting safety-based objectives.

Bio: Shalmali Joshi is a Postdoctoral Fellow at the Center for Research on Computation and Society at Harvard University, and an incoming assistant professor at Columbia DBMI. Previously, she was a Postdoctoral Fellow at the Vector Institute. She received her Ph.D. from the University of Texas at Austin (UT Austin). Her research is on the algorithmic safety of Machine Learning for human-centered domains. Shalmali has contributed to the field of explainability, robustness, and novel algorithms for ML safety with an emphasis on practical generative settings and impact on decision-making. Shalmali has published in ML and inter-disciplinary venues in healthcare such as NeurIPS, FAccT, CHIL, MLHC, PMLR, and perspectives in JAMIA, LDH, and Nature Medicine. She has co-founded the Fair ML for Health NeurIPS workshop, General Chair for ML4H 2022, and Program Chair for MLHC 2022.

2022 Spring Seminars

Title: The Electronic Medical Records and Genomics (eMERGE) Genomic Risk Assessment and Management Network – Challenges and Opportunities

Speaker: Cong Liu, PhD  – Associate Research Scientist, Department of Biomedical Informatics, Columbia University

Watch the presentation here

Abstract: eMERGE is a national consortium, organized by the NHGRI, that conducts discovery and clinical implementation research in genomics and genomic medicine at research institutions across the country. Established in 2007,eMERGE research combines DNA biorepositories with electronic health record (EHR) systems for large-scale, high-throughput genetic studies. In this talk, I will introduce the resources and infrastructure has been established for the eMERGE network as well as potential research opportunities. During the past phases, the network has generated and maintained the clinical and genetic data for ~135,000 unique participants, which includes electronic phenotypes, genotyping array, exome sequencing, whole genome sequencing, pharmacogenomics, and an ACMG 59 emphasized custom panel. During the current phase, the network is charged with developing Genome Informed Risk Assessments (GIRA) for common complex diseases such as breast cancer and chronic kidney disease. GIRA is designed to combine genotyping for polygenic risk score (PRS), sequencing of monogenic genes, family health history, and clinical data. The network will validate the accuracy and the utility of GIRA by conducting a prospective study with a plan to recruit ~25,000 individuals focused on underrepresented populations, across a wide range of ages. The network will also explore how to integrate GIRA into the EHR and return the risk assessment along with care recommendations to both participants and their providers.

Bio: Dr. Cong Liu is an Associate Research Scientist at the Department of Biomedical Informatics at Columbia University. Dr. Liu’s research resides in the areas of genomics and informatics tools innovation. His research focuses on developing and applying novel informatics methods for genetic disorders diagnosis and risk prediction, as well as facilitating the implementation of genomic medicine using the electronic health record systems. Dr. Liu received his B.S. in Biological Science from the Fudan University, M.S. in Mathematics from University of Illinois at Chicago, Ph.D. in Bioinformatics from University of Illinois at Chicago. He later joined the Columbia University and completed his Post-Doctoral training at the Department of Biomedical Informatics.

Speaker: Tal Korem, PhD – Assistant Professor, Departments of Systems Biology and Obstetrics & Gynecology, Columbia University

Title: The vaginal microbiome and metabolome in spontaneous preterm birth

Seminar is not posted at request of the presenter

Abstract: The paired analysis of the microbiome and metabolome is revolutionizing our mechanistic understanding of microbial ecosystems. Analyzing vaginal microbial and metabolites data from samples collected early in pregnancy, we identified novel interactions with preterm birth. We propose that several preterm-birth-associated metabolites may be exogenous, and investigate the sources of another using metabolic models. We further show that the metabolome can accurately predict the risk for preterm delivery. Altogether, our results demonstrate the potential of vaginal metabolites as early biomarkers of sPTB and highlight exogenous exposures as potential risk factors for prematurity.

Bio: Tal Korem’s research program focuses on computational methods that identify and interpret host-microbiome interactions in various clinical settings, and specifically those related to women’s health. He has developed several new approaches for microbiome data analysis, inferring microbial growth rates, structural variants, and microbiome-metabolite interactions; and has applied these methods in diverse clinical and biological investigations, most notably for personalization of dietary treatment for normalizing glycemic responses. He is an Assistant Professor in the Departments of Systems Biology and Obstetrics & Gynecology at Columbia University.

Title: Achieving TechQuity 

Speaker: Cheryl Clark MD, ScD – Associate Chief for Equity Research & Strategic Partnerships, Division of General Medicine and Primary Care, Brigham and Women’s Hospital; Assistant Professor of Medicine, Harvard Medical School Seminar not recorded at request of presenter

Abstract: Open discussions of social justice and health inequities may be an uncommon focus within information technology science, business, and health care delivery partnerships. However, the COVID-19 pandemic—which disproportionately affected Black, indigenous, and people of color—has reinforced the need to examine and define roles that technology partners should play to lead anti-racism efforts through our work. In this hour, we will discuss the imperative to prioritize TechQuity, and addressing social contexts in the implementation of AI and other technologies.

Bio: Cheryl Clark MD, ScD, is an Assistant Professor of Medicine at Harvard Medical School and a Hospitalist, social epidemiologist and Associate Chief in the Brigham and Women’s Hospital Division of General Medicine and Primary Care for Equity Research & Strategic Partnerships. Dr. Clark’s research focuses on social determinants of cardiometabolic health in diverse and aging populations. She is principal investigator for community engagement in the New England hub of the National Institutes of Health All of Us Research Program and chaired the social determinants of health (SDOH) Task Force that developed the SDOH participant provided information survey for All of Us.  Dr. Clark serves on the Mass General Brigham Predictive Analytics committee to provide equity review of algorithms considered for clinical implementation. Dr. Clark chaired the COVID-19 equity response team during the early phase of the COVID-19 pandemic in 2020.  She is the inaugural recipient of the Equity, Social Justice and Advocacy Award from Harvard Medical School and Harvard School of Dental Medicine.

Title: Racial and Ethnic Differences in Genetic Testing Uptake and Results among Young Breast Cancer Survivors: Looking Ahead at Future Work  

Speaker: Tarsha Jones, Assistant Professor of Nursing, Florida Atlantic University

Seminar not recorded at request of presenter

Abstract: Genetic testing for hereditary breast and ovarian cancer (HBOC) syndrome (e.g., BRCA1/2 genes) is recommended for all young women diagnosed with breast cancer at ≤ age 45, yet there is an underutilization of this critical test among this population. In this presentation, I will provide an overview of the current landscape of genetic testing and discuss my program of research that focuses on racial and ethnic differences in genetic testing uptake and results among young breast cancer survivors (YBCS). In addition, I will provide an overview of my current and future work including our innovative web-based decision aid intervention, RealRisks, that we are adapting for racially/ethnically diverse young breast cancer survivors in order to increase access to genetic testing and family risk communication. A special emphasis is placed on promoting health equity and reducing cancer health disparities.

Bio: Dr. Jones is an Assistant Professor of Nursing at the Christine E. Lynn College of Nursing at Florida Atlantic University.  She obtained a Bachelor’s of Science in Nursing degree from Seton Hall University and a Master’s of Science in Nursing degree from the Catholic University of America with a specialization in community/public health nursing and the care of immigrants, refugees, and global health. She holds a certification as an advanced public health nurse (PHNA-BC). She obtained a Doctor of Philosophy (PhD) in Nursing degree from Duquesne University and completed a post-doctoral research fellowship at Dana Farber Cancer Institute and Harvard Medical School.
Her research focuses on cancer prevention and control, risk-communication, and risk-reduction. Her current work focuses on improving uptake of genetic testing for breast cancer risk (i.e., BRCA1/2 genes and multigene panel testing) through culturally appropriate interventions, to facilitate informed decision-making for cancer risk-reducing strategies, and to promote family risk communication among young breast cancer survivors and their at-risk family members, with a particular emphasis on Black and Hispanic women. Her research is supported by the National Institute of Health (NIH) and the DAISY Foundation.

Lena Mamykina

Speaker: Lena Mamykina, PhD – Associate Professor of Biomedical Informatics

Title: Do People Engage Cognitively with AI? Impact of AI Assistance on Incidental Learning

Abstract: Introduction of AI-powered systems in many domains of human life often rests on the assumption that humans can use their common sense, domain knowledge and experience, and critical thinking to examine AI output and to decide whether to act on it or to dismiss it. This is particularly the case in such critical domains as health and medicine. But is this assumption really justified and do people in fact critically examine AI-generated output? In this talk I will describe results of several experiments conducted on Lab in the Wild, a popular online platform for psychological and behavioral experiments, that specifically examined individuals’ cognitive engagement with AI-powered decision support and the role of explanations in facilitating this engagement. We consider learning gains as evidence of cognitive engagement and show that explanations can indeed lead to a deeper engagement with AI. However, the design of decision support and placement of explanations within the decision making process play a critical role in their impact. I conclude with analysis of implications for future AI-powered decision support tools.

Bio: Dr. Lena Mamykina is an Associate Professor of Biomedical Informatics at the Department of Biomedical Informatics at Columbia University. Dr. Mamykina’s research resides in the areas of Biomedical Informatics, Human-Computer Interaction, Ubiquitous and Pervasive Computing, and Computer-Supported Collaborative Work. Her research focuses on the design of innovative interactive systems in health that incorporate machine learning and AI. Dr. Mamykina received her B.S. in Computer Science from the Ukrainian State University of Maritime Technology, M.S. in Human Computer Interaction from the Georgia Institute of Technology, Ph.D. in Human-Centered Computing from the Georgia Institute of Technology, and M.A. in Biomedical Informatics from Columbia University. Prior to joining DBMI as a faculty member, she completed a National Library of Medicine Post-Doctoral Fellowship at the department.

Speaker: Undina Gisladottir, Ph.D. Student – Dr. Nicholas Tatonetti’s Lab

Title: Propensity Scores Improve the Performance of Self Controlled Case Series Studies using Electronic Health Records

Abstract: Randomized control trials are the gold standard for determining the safety and efficacy of a drug. However, the strict exclusion criteria for such trials can lead to unforeseen adverse drug events (ADEs) when released to the general public. For this reason, post-market surveillance is essential to ensure physicians can make informed decisions when prescribing. A self-controlled case series study using observational data, such as electronic health records (EHR), is an effective approach to identifying ADEs,  as it controls for time-invariant confounders such as sex and race/ethnicity. However, ascertainment bias in EHR leads to inherent differences between the ‘risk’ and ‘baseline’ periods, which results in greater false positives. Some groups use negative controls to adjust the relative risk but this can be time-consuming and requires expert knowledge. In this study, we propose using interval-specific propensity scores to adjust for the bias between risk and baseline periods. We applied our method to an ADE prediction task using 370 known drug-event pairs from a reference ADE set using NYP CUIMC hospital (~16K patients) and validated in MarketScan’s Medicare dataset (~1.5M patients). We found that using the interval-specific propensity score significantly increased coverage and decreased bias. Our results show that propensity scores may reduce the effect of ascertainment bias in SCCS studies using observational data, enabling more reliable drug safety estimates.

Bio: Undina Gisladottir is a third-year Ph.D. student in Dr. Nicholas Tatonetti’s lab. Her current research uses electronic health records to further our understanding of drug effects and adverse drug events. Prior to joining DBMI, Undina completed her bachelor’s in biomedical engineering at Boston University and her master’s in biomedical informatics at HMS where she conducted research with Dr. Nils Gehlenborg and Dr. Chirag Patel.

Speaker: Harry Reyes Nieva, Ph.D. Student – Dr. Nóemie Elhadad’s Lab

Title: Mining the Health Disparities and Minority Health Bibliome

Abstract: Lack of a large-scale survey of the health disparities and minority health (HDMH) literature leaves the field potentially vulnerable to disproportionately focus on specific populations or emphasize certain conditions, curtailing our ability to fully advance health equity and improve our understanding of the health of minoritized communities. We propose using scalable methods to characterize trends and isolate potential gaps and blind spots in HDMH research. To support investigators in navigating the HDMH bibliome, we are also actively developing HDMH Monitor, an interactive dashboard and article repository.

Using a pre-validated MEDLINE/PubMed search strategy, we extracted HDMH articles (~250K in total) and their meta-data via the open-source MEDLINE API. We employed a three-pronged approach scalable to the entire corpus. To characterize HDMH literature, we identified: (1) studied populations and study designs using Medical Subject Headings (MeSH); (2) conditions mentioned in abstracts and titles using clinical named-entity recognition (CNER); and (3) emerging topics of study through probabilistic topic modeling (i.e., latent Dirichlet allocation). To characterize the HDMH bibliome further, we compared trends in studied conditions to relative condition prevalence in large claims datasets (42+ million Americans). 

Large-scale analysis yields insights about trends in HDMH research: half (50%) of all HDMH articles concerned just three International Classification of Diseases (ICD) chapters (cancer, mental health, endocrine/metabolic disorders); disease prevalence in the general population was not necessarily indicative of HDMH research foci; and disease coverage in the literature was highly variable among minoritized populations. Notable temporal trends among topics include increased focus on community-based research; decreased focus on economic policy and medical education; and emergence of nascent topics like sexual and gender minority health. Our approach employs scalable methods for processing, characterizing, and monitoring an ever-increasing body of literature systematically. Leveraging ontologies and CNER enables top-down assessment of studied conditions and, by extension, those not well represented across populations, while topic modeling allows for a bottom-up identification of emerging themes. Common terminology (ICD) allows for direct comparison across data sources. 

Bio: Harry Reyes Nieva is a third year Ph.D. student in Dr. Noémie Elhadad’s lab. His current research primarily aims to use and expand the vast toolbox that computational methods offer to better understand, improve, and facilitate the study of health in underserved communities and advance health equity. Harry received his B.A. from Yale University and Master of Applied Science from the Johns Hopkins Bloomberg School of Public Health. Prior to starting his Ph.D., Harry was a member of the MTERMS lab led by Dr. Li Zhou at Harvard Medical School/Mass General Brigham and the Strategic Information division of the U.S. President’s Emergency Plan for AIDS Relief (PEPFAR) at Harvard, which aimed to rapidly expand treatment and care programs for people living with HIV/AIDS in Botswana, Nigeria, and Tanzania.

• Dr. Ashley Beecy, Assistant Professor at Weill Cornell Medicine and NYP Hospital 
• Dr. Salvatore Crusco, Clinical Informatics Fellow at Columbia University Hospital
• Jennifer Beirne, MHA, MA CPHIMS, Director at People & Organization Development team at Columbia University

Dr. Ashley Beecy is an Assistant Professor of Medicine in the Department of Medicine, Division of Cardiology at Weill Cornell Medicine. She serves as the Clinical Lead for IT Transformation and Advanced Analytics at NewYork Presbyterian. Her research is focused on digital health including the implementation of artificial intelligence (AI) and the use of AI to study cardiovascular imaging.  

Dr. Salvatore Crusco is a second-year clinical informatics fellow at NYP/DBMI with a keen interest in clinical decision support (CDS). Sal has worked with the CDS workgroup to develop a sub-committee, the CDS Optimization Workgroup, which meets weekly to discuss optimization efforts for alerts that are non-intuitive, untimely, interruptive, non-actionable, and continually re-firing. Most of these efforts are geared toward reducing alert fatigue for users while prioritizing patient care. 

Jennifer Beirne oversees the Optimization track for the People & Organization Development team at ColumbiaDoctors. She and her team work with stakeholders across the institution to apply a structured approach to improving workflows and user proficiency within the EHR.   Prior to joining her current team, she helped support CUIMC’s Epic implementation as part of ColumbiaDoctors’ Office of the CMIO.   Jennifer completed DBMI’s Certification of Professional Achievement in HIT in 2017.

Speaker: Adrienne Pichon, PhD Student – Dr. Noemie Elhadad’s Lab

Title: Informing the Design of Individualized Self-management Regimens from the Human, Data, and Algorithmic Perspectives 

Abstract: Self-management is critical to care of chronic illness, but developing a personalized self-management regimen that works for an individual often requires a lengthy and frustrating trial-and-error process. Personal health informatics solutions could augment this experimentation process by leveraging artificial intelligence, specifically reinforcement learning (RL). This talk presents a mixed-methods study that addresses both technical and human challenges that remain in translating promising computational methods to a complex, real-world setting.

We use “in the wild” self-tracking data from the Phendo app alongside conversations with users to assess the feasibility of a tool in the context of endometriosis. Data from 10,463 users, detailing their personal experience of illness (eg, symptoms) and self-management (eg, physical activity), are used to characterize the breadth and patterns of self-management strategies used in practice and to quantify population and individual effects. Qualitative analysis of transcripts from prior focus groups (10 groups, n=48) and follow-up interviews (n=3) represents the end-user perspective. We integrate results across methods to map the boundaries and constraints at the intersection of computational and human viewpoints. 

Findings suggest that user engagement patterns and data availability are sufficient for RL requirements. Users confirm that they want this type of support and are willing to experiment with a broad range of strategies. Both data and human perspectives affirm that personally tailored solutions are necessary, despite substantial heterogeneity. Design recommendations include promoting control and autonomy, incorporating context, and enabling explainability.

Bio: Adrienne Pichon is a third year PhD student in Dr. Noémie Elhadad’s lab. Her current research focuses on supporting the needs of patients and their care teams in complex and uncertain chronic illness contexts. Adrienne received her MPH from Columbia University’s Mailman School of Public Health, and contributed to research both at Mailman and the School of Nursing before coming to DBMI.


Speaker: Yiwie Sun, PhD Student – Dr. Harris Wang’s Lab

Title: Discovery of pathogen-inhibitory commensal gut microbiota by high-throughput culturomics.

Abstract: Vancomycin-resistant Enterococcus (VRE) can densely colonize intestines and cause bloodstream infections in people who have received antibiotic-mediated treatments and consequently suffer from the loss of commensal microbiota. Fecal Matter Transplant (FMT) has been shown to be able to efficiently clear VRE from the gut, but it remains unclear which species in particular play a role in clearance of VRE. Herein, we demonstrated that key bacterial strains can directly inhibit VRE growth and clear VRE from mouse intestines. By implementing a high-throughput strain isolation and culturation system, we isolated >2300 isolates from ICU patients as well as healthy human individuals and screened for inhibitory effects against VRE in vitro. Candidate strains were shown to inhibit VRE growth in vitro and eliminate VRE in mouse infection models. Furthermore, we discovered key metabolites produced by these strains that explain the mechanism of VREgrowth inhibition. These findings suggest that probiotic therapy using the candidate strain may reduce VRE-related inter-patient transmission and promote recovery of native commensal microbiota. 

Bio: Yiwei Sun is a third year PhD student in Dr. Harris Wang’s lab. Her current research focuses on examining the relationship between gut microbiome and intestinal diseases. Prior to PhD, she received her B.S. in Microbiology, Immunology, and Molecular Genetics from UCLA where she conducted research with Dr. Grace Xiao.

Speaker: Katie Brown, PhD student – Dr. Nicholas Tatonetti’s lab

Title: Estimating the heritability of SARS-CoV-2 susceptibility and COVID-19 severity

Abstract: Over 340 million people have been infected with SARS-CoV-2 since its discovery in 2019. Pharmaceutical companies continue to search for effective therapeutics to counter COVID-19. While genetic studies have the potential to highlight relevant biological pathways and drug targets, understanding the overall heritability of SARS-CoV-2 susceptibility and COVID-19 severity is important for contextualizing their results and prioritizing future studies.  To date, associated loci are estimated to explain <1% of variation in patient susceptibility and severity.  In this talk, I will discuss our approach to estimating the importance of shared environment and genetics to SARS-CoV-2 susceptibility and COVID-19 severity.


Speaker: Michael Zietz, PhD student – Dr. Nicholas Tatonetti’s lab

Title: Estimated genetic liability as a proxy phenotype for GWAS

Abstract: Deciphering the genetic architecture of complex disease is a major challenge in biomedical research and one that would simplify the search for new preventions, treatments, and cures. The genetic contributions to complex traits and diseases arise from thousands of genetic variants, most of which have only small effects. While major biobank projects have enabled the estimation of many small effects through the collection of very large cohorts, nonetheless statistical power remains a challenge for variant effect estimation. Many complex traits and diseases have shared genetic contributions, manifesting in both genetic and phenotypic correlations. Various traits, therefore, contain predictive information about a patient’s genetic risk for a trait of interest. We developed a method to estimate patient-level genetic liabilities for a trait of interest using a deeply phenotyped cohort and summary information such as trait heritabilities and trait genetic correlations. Preliminary results suggest that using the estimated genetic liability of a trait as a proxy in a genome-wide association study leads to greater power to detect variant effects. We are currently expanding our use of the new method to larger sets of traits, in order better to evaluate its strengths and limitations. Our goal is to produce a method which can provide a better understanding of complex trait architecture using fewer samples than existing methods.

2021 Fall Seminars

Title: Prediction-driven surge planning with applications in the emergency department

Watch The Presentation Here

Abstract: Optimizing emergency department (ED) nurse staffing decisions to balance the quality of service and staffing cost can be extremely challenging, especially when there is a high level of uncertainty in patient-demand. Increasing data availability and continuing advancements in predictive analytics provide an opportunity to mitigate demand-rate uncertainty by utilizing demand forecasts. In this work, we study a two-stage prediction framework that is synchronized with the base (made months in advance) and surge (made nearly real-time) staffing decisions in the ED. We quantify the benefit of the more expensive surge staffing. We also propose a near-optimal two-stage staffing policy that is straightforward to interpret and implement. Lastly, we develop a unified framework that combines parameter estimation, real-time demand forecasts, and staffing in the ED. High fidelity ED simulation experiments demonstrate that the proposed framework can reduce staffing costs by 8% – 17% while guaranteeing timely access to care. Joint work with Jing Dong and Yue Hu. 

Bio: Carri W. Chan is a Professor of Business in the Decision, Risk and Operations Division and the Faculty Director of the Healthcare and Pharmaceutical Management Program at Columbia Business School. Her research is in the area of healthcare operations management. Her primary focus is in data-driven modeling of complex stochastic systems, efficient algorithmic design for queuing systems, dynamic control of stochastic processing systems, and econometric analysis of healthcare systems. Her research combines empirical and stochastic modeling to develop evidence-based approaches to improve patient flow through hospitals. She has worked with clinicians and administrators in numerous hospital systems including Northern California Kaiser Permanente, New York Presbyterian, and Montefiore Medical Center. She is the recipient of a 2014 National Science Foundation (NSF) Faculty Early Career Development Program (CAREER) award, the 2016 Production and Operations Management Society (POMS) Wickham Skinner Early Career Award, and the 2019 MSOM Young Scholar Prize. She currently serves as a co-Department Editor for the Healthcare Management Department at Management Science. She received her BS in Electrical Engineering from MIT and MS and PhD in Electrical Engineering from Stanford University.

Talk title: Are phenotyping algorithms fair for underrepresented minorities within older adults?

Watch The Presentation Here 

Abstract: The widespread adoption of machine learning (ML) algorithms for risk-stratification has unearthed plenty of cases of racial/ethnic biases within algorithms. When built without careful weightage and bias-proofing, ML algorithms can give wrong recommendations, thereby worsening health disparities faced by communities of color. Biases within electronic phenotyping algorithms are largely unexplored. In this work, we look at probabilistic phenotyping algorithms for clinical conditions common in vulnerable older adults: dementia, frailty, mild cognitive impairment, Alzheimer’s disease, and Parkinson’s disease. We created an experimental framework to explore racial/ethnic biases within a single healthcare system, Stanford Health Care, to fully evaluate the performance of such algorithms under different ethnicity distributions, allowing us to identify which algorithms may be biased and under what conditions. We demonstrate that these algorithms have performance (precision, recall, accuracy) variations anywhere between 3 to 30% across ethnic populations; even when not using ethnicity as an input variable. In over 1,200 model evaluations, we have identified patterns that indicate which phenotype algorithms are more susceptible to exhibiting bias for certain ethnic groups. Lastly, we present recommendations for how to discover and potentially fix these biases in the context of the five phenotypes selected for this assessment.

Bio: Dr. Juan M. Banda at his GSU lab, Panacea Lab, works on building machine learning, and NLP methods that help to generate insights from multi-modal large-scale data sources, with applications to precision medicine, medical informatics, as well as other domains. His research interests are not limited to structured data, he is also well-versed in extracting terms and clinical concepts from millions of unstructured electronic health records and using them to build predictive models (electronic phenotyping) and mine for potential multi-drug interactions (drug safety). Dr. Banda’s has published over 70 peer reviewed conference and journal papers and serves as an editorial board member of the Journal of the American Medical Informatics and Frontiers in Medicine – Translational Medicine, and a reviewer for JBI, nature Digital Medicine, nature Scientific Data, nature Protocols, PLOS One, and several other leading journals. Prior to being an assistant professor of Computer Science at Georgia State University, Dr. Banda was a postdoctoral scholar, then a research scientist at Stanford’s center of Biomedical Informatics. He is an active collaborator of the Observational Health Data Sciences and Informatics, and his work has been funded by the Department of Veteran Affairs, National Institute of Aging as well as NASA, NSF and NIH, and serves as a PC member and chair for several conferences and workshops including ICML, NeurIPS, FLAIRS, IEEE Big Data, among others.

Title: Exploring Gender Differences in Time to Diagnosis Abstract: Sex differences and gender disparities play a significant role in the initial diagnosis and treatment of disease, often leading to differential healthcare outcomes between women and men. We examine differences in disease prevalence and time-to-diagnosis across databases and populations, with a particular emphasis on identifying metrics for systematically characterizing these differences in OMOP. From there, we further examine how algorithms trained on these data might reproduce existing disparities and explore how gender concordance might impact the disease diagnosis process. Last, we examine how fairness metrics can be used to roughly assess the fairness of phenotypes.

Speaker: Linying Zhang, PhD Student

Title: Algorithmic fairness in medicine: A case study in glomerular filtration rate (GFR) prediction

Abstract: The appropriate use and the implications of using variables that attempt to encode a patient’s race in medical predictive algorithms remains unclear. One example of an algorithm that includes a race variable is the equation for estimating glomerular filtration rate (GFR), an indicator of kidney function used to classify the severity of chronic kidney disease (CKD). However, the observed difference between Black and non-Black participants lacks biologically substantiated evidence. A recent study showed that removing race as a variable from the estimated GFR equation could have a significant impact on recommended care for Black patients (e.g., increasing CKD diagnoses among Black adults could improve access to specialist care and kidney transplantation). However, they did not study whether removing the race modifier leads to more accurate GFR predictions for Black patients. Recently, many algorithmic fairness definitions have been proposed and studied in domains such as education, economics and criminal justice, but their applicability to medical predictive algorithms has not been well explored. We examined the appropriateness of various algorithmic fairness definitions in the context of understanding the impact of race on GFR prediction in terms of model performance and fairness. We consider the use case of drug dosing, in which the difference between the true GFR and the calculated GFR will be relevant. 

Title: Predictive modeling for self-tracking apps: a case study in menstrual health 

Watch The Presentation Here 

Abstract: Self-tracking apps provide a rich source of health observations that hold the promise to characterize underlying physiological state and disease trajectories, as well as to support users in self-managing their health. But these data streams can also be unreliable since they hinge on user adherence to the app. In this talk, I will focus on menstrual trackers, a highly popular type of self-tracking technology. I will present our ongoing work on characterizing variability in menstrual cycle within and across individuals and building models that predict next cycle date all the while accounting for skipped tracking data.

Bio: Noémie Elhadad is an Associate Professor of Biomedical Informatics, affiliated with Computer Science and the Data Science Institute at Columbia University. She serves as Biomedical Informatics Vice Chair of Research and Graduate Program Director. Her research is at the intersection of machine learning, technology, and medicine. 

Title: Multimorbidity Patterns Across Race/Ethnicity Stratified by Age and Obesity: A Cross-sectional Study of a National US Sample


Objectives: The objective of our study is to assess differences in prevalence of multimorbidity by race. 

Methods: We applied the FP-growth algorithm on middle-aged and elderly cohorts stratified by race, age, and obesity level. We used 2016-2017 data from the Cerner HealthFacts® Electronic Health Record data warehouse.  We identified disease combinations that are shared by all races/ethnicities, those shared by some, and those that are unique to one group for each age/obesity level. 

Results: Our findings demonstrate that even after controlling for age and obesity, there are differences in multimorbidity prevalence across races. There are multimorbidity combinations distinct to some racial groups—many of which are understudied. Some multimorbidities are shared by some but not all races. African Americans presented with the most distinct multimorbidities at an earlier age. 

Discussion: The identification of prevalent multimorbidity combinations amongst subpopulations provides information specific to their unique clinical needs. 

Title: AI Tools for Design and Innovation
Abstract: How can computational tools and AI help people be better at innovation and creative problem-solving? When solving a problem, people have the tendency to fixate on one problem or solution. If that one idea doesn’t work, they get stuck. To avoid getting stuck, the design process encourages people to have multiple ideas, and explore the space of possibilities before deciding on a problem or a solution. Although this works, it’s highly complex- requiring people to follow many threads at once. We show how AI and other computational tools can help simplify and speed up the most cognitively taxing aspects of the design process: 
  1. Collecting multiple partial solutions
  2. Synthesizing partial solution into multiple prototypes
  3. Quickly iterating on prototypes to produce an MVP 
Bio: Lydia Chilton is an Assistant Professor in the Computer Science Department at Columbia University. Her research is in computational design – how computation and AI can help people with design, innovation, and creative problem-solving. Applications include: creating graphics for journalism, developing technology for public libraries, improving risk communication during hurricanes, and helping scientists explain their work on Twitter.

Title: Towards High-Quality Structured Data from Clinical Notes

Abstract: The real-world evidence found in electronic health records contain the scale of data required for more personalized medicine, from heterogeneous treatment effect estimation to disease progression modeling. Unfortunately, many of the variables needed for such research (treatment information, comorbidities, disease stage) are found not in structured data, but trapped within clinical notes. Due to the messiness of free-text notes and the sparsity of labels, clinical information extraction can be challenging in practice; tasks as fundamental as clinical concept normalization remain largely unsolved. In this talk, I will present machine learning solutions that can operate with minimal labeled data by leveraging unlabeled data and humans-in-the-loop. However, ultimately, it would be ideal if clinical notes were easier to parse to begin with. I will describe our efforts, in collaboration with and piloted at Beth Israel Deaconess Medical Center, to reimagine the process of clinical documentation to facilitate and incentivize the creation of high-quality data at the point-of-care.

Bio: Monica Agrawal is a 4th year PhD student at MIT CSAIL in the Clinical Machine Learning Group, advised by David Sontag. Her research revolves around synthesis of longitudinal clinical notes and the creation of smarter electronic health records. She previously received a BS/MS from Stanford University in computer science. She is supported by a Takeda fellowship.

Title: What the CONCERN Study Has Taught Us About Racial Bias in Nursing Workflow

Watch The Presentation Here

Abstract: Early detection of patient deterioration in the hospital is a clinically significant issue.  Our team has built a clinical decision system called CONCERN (Communicating Narrative Concerns Entered by RNs). The CONCERN study leverages big data analytic techniques to increase interdisciplinary shared situational awareness for patients at risk of decompensation using clinically relevant information that may otherwise be missed by the care team.  CONCERN uses nursing surveillance patterns to risk stratifying patients for deterioration to support clinical decision-making.  This multi-site (Columbia University and Brigham Women’s Hospital) project is currently evaluation Ing the relationship between CONCERN uses and patient outcomes, inpatient mortality, and length of stay, using a clustered randomized control trial. CONCERN is the first NIH (National Institute of Health) funded study to evaluate a nurse-driven machine learning-based clinical decision support system with a randomized clinical trial. My presentation will present an overview of our project, the infrastructure of our intervention, lessons learned about racial bias in these data, and proposed future work.

Bio: Kenrick Cato, PhD, RN, CPHIMS, FAAN, is an Assistant Professor Columbia University School of Nursing, and Columbia University Vagelos School of Physicians and Surgeons Department of Emergency Medicine.  Dr. Cato has a varied background. He worked at NewYork-Presbyterian Health system as a surgical and medical oncology staff nurse and as an analyst in the information technology department, working on projects to improve patient safety through the use of Clinical decision support.  In the analyst position, he focused on projects to improve patient safety through the optimization of the hospital’s electronic systems. Dr. Cato’s program of research focuses on the mining of electronic patient data to support clinical decision making.  His previous work includes National Institute of Health-funded research in health communication via mobile health platforms, shared decision making in primary care settings and data mining of electronic patient records. His current projects include automated data mining of electronic patient records to discover patient characters that are often missed and the development of predictive models for inpatient clinical deterioration.

Title: Machine Learning Applications in Cardiology 

Watch The Full Presentation Here

Abstract: In this talk we will discuss why and how deep learning approaches have the potential to greatly impact cardiac imaging. We will then explore use cases developed here at Columbia that have led to two of the world’s first prospective clinical trials of deep learning in cardiology. Lastly we’ll critique the limitations of current ML approaches preventing mainstream adoption in order to answer the question, “What are the big problems the field needs to be tackling now?” (and maybe even answer, “What’s a really good idea for me to do research on as a grad student?”)

Bio: Pierre Elias, MD is a cardiology fellow at Columbia University Irving Medical Center who recently completed a two-year postdoc in the Perotte Lab at DBMI.

Title: Addressing the challenges of the “fourth paradigm” in biology and medicine Abstract: Recent advances in biotechnology and medicine allow us to collect an immense amount of physiological, contextual, and biological data at the personalized and population level. This surge in data gives rise to a paradigm shift in biology and medicine towards data intensive discoveries. While this provides the perfect opportunity to study human biology and disease, it also presents daunting challenges in data analysis, privacy and sharing at scale. In this talk, first, I will discuss the scalable tools I have developed to overcome privacy concerns associated with sharing functional genomics and genomics data. Second, I will review the computational tools I have developed to address the challenge of high-throughput functional genomics data analysis. I will end my talk by describing the vision of my future lab. This will include developing methods to address the questions related to 1- biomedical data privacy for sharing data in research and clinical setting and 2- multi-omics data integration to understand the relationship between genotypes and phenotypes.

Title: Towards a unified systems theory of mental disorders

Abstract: Understanding the biology of psychiatric disorders requires analyses on multiple levels of hierarchical organization: on the level of genes, cellular networks, neuron types, brain circuits, and patient phenotypes. Over the last decade, our lab has pioneered advances on all these organizational levels, for disorders such as autism and schizophrenia. We believe that the emerging data now allows to make an informed generalization about the etiology of major psychiatric disorders. Using examples primarily from autism spectrum disorder (ASD), I will discuss our recent work on understanding brain circuits that are likely perturbed across disorders. We have recently developed an approach to integrate genetic data with high-resolution spatial gene expression and brain-wide mesoscale connectome. The application of the approach to autism demonstrates that ASD mutations perturb widely distributed sets of brain circuits with interrelated biological functions and structures from the cortex, striatum, amygdala, thalamus and hippocampus. The identified circuits are generally responsible for the integration of sensory and emotional information as well as context-dependent learning and decision-making based on this information. Our preliminary analyses show that similar circuits are also affected in schizophrenia and likely in many other mental disorders. We have also discovered that each ASD gene can be characterized by a parameter, phenotype dosage sensitivity (PDS), which quantifies the relationship between changes in a gene’s dosage and changes in each disorder phenotype. We believe that the relationship characterized by PDS is likely to generalize to other disorders and human phenotypes. Finally, I will discuss how the emerging picture puts us on the path towards explaining the common genetic risk factor underlying multiple psychiatric disorders (p-factor) and how specific phenotypes may arise in each disorder.

2021 Spring Seminars

Speaker: Rafael Irizarry, PhD Professor and Chair of the Department of Data Sciences at the Dana-Farber Cancer Institute; Professor of Biostatistics at Harvard T.H. Chan School of Public Health

Title: Probabilistic Gene Expression Signatures for Single Cell RNA-seq Data 

Watch The Presentation Here

Abstract:  In this talk Prof. Irizarry will describe his general approach to developing statistical solutions to problems in high throughput biology. He will focus on an example related to predicting cell types from single cell RNA-seq data. He will discuss challenges such as batch effects and sparse data and describe statistical solutions for these. Finally, he will show recent results from a collaboration involving spatial transcriptomics.

Biography: Rafael Irizarry received his Bachelor’s in Mathematics in 1993 from the University of Puerto Rico and went on to receive a Ph.D. in Statistics in 1998 from the University of California, Berkeley. His thesis work was on Statistical Models for Music Sound Signals. He joined the faculty of the Johns Hopkins Department of Biostatistics in 1998 and was promoted to Professor in 2007. He is now Professor and Chair of the Department of Data Sciences at the Dana-Farber Cancer Institute and a Professor of Biostatistics at Harvard T.H. Chan School of Public Health.

Professor Irizarry’s work has focused on applications in genomics. In particular, he has worked on the analysis and signal processing of high-throughput data. He has distinguished himself by disseminating his statistical methodology as open source software shared through the Bioconductor Project, a leading open source and open development software project for the analysis of high-throughput genomic data. His widely downloaded software tools have helped him become one of the most highly cited scientists in his field. Although Professor Irizarry’s focus has been in genomics, he is an applied statistician generally interested in read-world problems. During his career he has co-authored papers on a variety of topics including musical sound signals, infectious diseases, circadian patterns in health, fetal health monitoring, and estimating the effects of Hurricane María in Puerto Rico.

Professor Irizarry’s dedication to education is best demonstrated by the success of the numerous trainees he has mentored. He has also developed several HarvardX online courses on data analysis, which have been completed by thousands of students. These courses are divided into three series: Professional Certificate in Data ScienceData Analysis for the Life Sciences and Genomics Data Analysis. He shares the material for these courses through textbooks that are freely available online and reproducible code through GitHub. Professor Irizarry also dedicates his time providing service to the profession. Examples of this work include serving as the chair of the Genomics, Computational Biology and Technology Study Section (GCAT) National Institute of Health (NIH) study section, the search committee for the National Library of Medicine director, the National Academy of Sciences Gulf War and Health Committee, and the National Advisory Council for Human Genome Research.

Professor Irizarry has received several awards honoring the work described above. In 2009, the Committee of Presidents of Statistical Societies (COPSS) named him the Presidents’ Award winner. The Presidents’ Award is arguably the most prestigious award in Statistics. That year he was also named a fellow of the American Statistical Association. In 2017 the members of chose Professor Irizarry the laureate of the Benjamin Franklin Award in the Life Sciences. In 2020 he became an ISCB Fellows. He has also received the 2019 Research Parasite Award for outstanding contributions to the rigorous secondary analysis of data, the 2009 Mortimer Spiegelman Award which honors an outstanding public health statistician under age 40, the ASA Youden Award in Interlaboratory Testing, the 2004 American Statistical Association (ASA) Outstanding Statistical Application Award, and the 2001 American Statistical Association Noether Young Scholar Award for researcher younger than 35 years of age who has significant research accomplishments in nonparametrics statistics.

Title: Identifying and Leveraging Public Data Sources with Social Determinants of Health Information for Population Health Informatics Research 

Speaker: Irene Dankwa-Mullan MD MPH, Chief Health Equity Officer, IBM Watson Health, IBM Corporation

Watch The Full Presentation Here

Abstract: Social determinants of health (SDOH) account for many health inequities. Data sources traditionally used in informatics research often lack SDOH, and, when available, SDOH may be difficult to leverage given it’s lack of specificity and lack of structured information. In this presentation, I will share the initial phases of work that we are doing around leveraging SDoH data – for health equity research – addressing some of the informatics challenges leveraging social determinants of health data to inform population health or inform health services research. I will discuss a case study using a machine learning clustering algorithm to uncover region-specific sociodemographic features and disease-risk prevalence correlated with COVID-19 mortality during the early accelerated phase of community spread.

Bio: Irene Dankwa-Mullan is nationally and internationally recognized physician and expert scientist working at the intersection of healthcare, health equity, public health, informatics, data science and applied artificial intelligence with over 60-peer-reviewed publications. She serves as the Chief Health Equity Officer and Deputy Chief Health Officer for research and evaluation at IBM Watson Health. As Chief Health Equity Officer, she works across business market segments to promote a culture of equity, ethical AI, diversity and inclusion. Her responsibilities as Deputy Chief Health Officer includes leadership for evaluation research and implementation science and promoting opportunities to advance the science of AI and advanced analytics. Dr. Dankwa-Mullan attended Barnard College where she majored in Biochemistry. She received her medical degree from Dartmouth Medical School, and a Master’s degree in Infectious Disease Epidemiology and Biostatistics from the Yale School of Public Health in a joint MD/MPH program. She completed residency training in Internal Medicine at the Johns Hopkins Hospital’s Bayview medical campus.

Speaker: Dr. Aarti Sathyanarayana, PhD – Harvard T.H. Chan School of Public Health

 Digital Phenotyping: Quantifying human health with low, medium and high frequency data streams

Watch The Presentation Here

Abstract: Digital health data is notoriously enigmatic. However, smartphones, wearables, and EEGs have the potential to provide enormous insight into human health and wellbeing. Making sense of these complex data streams requires new computational approaches that combine the best of signal processing and machine learning to find pragmatic solutions. Dr. Sathyanarayana will discuss challenges and solutions for translating low, medium and high frequency data into actionable insights for health, wellness, and performance.

Bio: Dr. Aarti Sathyanarayana is a postdoctoral research fellow in the department of biostatistics at the Harvard T.H. Chan School of Public Health. She also holds an appointment in the clinical data animation center at Massachusetts General Hospital and Harvard Medical School. Her research interests are in time variant health data analysis, signal processing, and machine learning. She strives to translate enigmatic health data into actionable insights, with an emphasis on digital phenotyping and digital biomarker discovery. Her recent work has focused on developing new methodologies to better understand smartphone, wearables, and EEG data in the context of human health and wellness. Prior to joining Harvard, Aarti received her PhD in computer science from the University of Minnesota, where her dissertation was selected for the university’s doctoral dissertation award. Since then, her work has won multiple junior investigator awards from the National Center of Women and Information Technology, the American Medical Informatics Association, the American Epilepsy Society, and the American Clinical Neurophysiology Society. Her expertise has also led her to hold positions at Apple, Intel, the Mayo Clinic, and Boston Children’s Hospital.

Speaker: Carlos Bustamante, PhD

Title: Why doing the right thing and diversifying clinical trials can unleash innovation in biopharma pipelines

Watch The Full Presentation Here

Abstract : Clinical Genetics Lacks Standard Definitions and Protocols for the Collection and Use of Diversity Measures. More:

Short bio: For the past 18 years, I have led a multidisciplinary team working on problems at the interface of computational and biological sciences. Much of our research has focused on genomics technology and its application in medicine, agriculture, and evolutionary biology. My first academic appointment was at Cornell University’s College of Agriculture and Life Sciences. There, much of our work focused on population genetics and agricultural genomics motivated by a desire to improve the foods we eat and the lives of the animals upon which we depend. I moved to Stanford in 2010 to focus on enabling clinical and medical genomics on a global scale. I have been focused on reducing health disparities in genomics by: (1) calling attention to the problem raised by >95% of participants in large scale studies being of European descent; and (2) broadening representation of understudied groups in large NIH funded consortia, particularly minority groups from the U.S., the Americas, and Africa. My work has empowered decision-makers to utilize genomics and data science in the service of improving human health and wellbeing. In the next phase of my career, I will focus on opportunities for bringing these technologies to consumers and patients, directly, where this work can have the greatest impact. I have a strong interest in building new academic units, non-profits, and companies. I was the Inaugural Chair of the Department of Biomedical Data Science—the first new department that Stanford has started in 14 years—and I was Founding Director (with Marc Feldman) of the Center for Computational, Evolutionary, and Human Genomics. I serve as an advisor to the US federal government, private companies, startups, and non-profits in the areas of computational genomics, population and medical genetics, veterinary and plant genomics, and business strategy.

Speaker: Megan Threats, PhD, MSLIS

Title: Toward health justice in informatics: a community-based, intersectional approach to HIV informatics intervention development 
Abstract: June 2021 will mark 40 years since the first cases of what would later become known as acquired immunodeficiency syndrome (AIDS) were reported in the United States. Despite groundbreaking biomedical advancements in HIV prevention and treatment, the HIV/AIDS epidemic continues to disproportionately affect sexual and gender minority communities of color. In this talk, I will discuss the development of an HIV informatics intervention aimed at reducing inequities in linkage and retention in HIV prevention and care among sexual minority Black men in the South. I will present strategies for leveraging informatics to achieve health justice in the fight to end AIDS. 
Bio: Dr. Megan Threats is an Assistant Professor in the Department of Library and Information Science at the School of Communication and Information at Rutgers University – New Brunswick. She is also Visiting Research Faculty at the Yale School of Public Health.

Speaker: Trevor Cohen, MBChB, PhD, FACMI

Title: Using Neural Language Representations to Detect Linguistic Anomalies in Neurodegenerative and Psychiatric Disease 

Watch The Full Presentation Here

Abstract: Language is uniquely positioned in mental health as both a focus of observation for clinical signs and symptoms, and a medium through which some forms of therapy are delivered.  Alzheimer’s Disease and other forms of dementia can also affect language production, for example by limiting access to more specific terms that describe the world in detail. In both cases, data from speech and text are increasingly available on account of the use of digital devices to mediate research and healthcare delivery. Neural language representations such as word embeddings, recurrent neural network language models, and contemporary transformer architectures have become a predominant point of focus in computational linguistics research. The models from which these representations are derived are typically trained on large amounts of unlabeled text, with training tasks involving predicting held-out terms that occur in proximity to observed ones. During the course of such training, much information about the typical use of language is learned. This information is of potential value for the detection of the atypical usage that may characterize certain clinical conditions. In this talk I will discuss our recent work in this area, with a focus on two areas of application: (1) a study of the responsiveness of deep neural networks that distinguish between responses to cognitive tasks from participants with and without Alzheimer’s Disease to known deficiencies in language production in this condition; and (2) the application of neural word embeddings to model language coherence in order to detect the disorganized thinking characteristic of episodes of psychosis in schizophrenia and other conditions. I will also more briefly touch on a range of related ongoing work involving efforts to model constructs that are of diagnostic or therapeutic importance in mental health.   


Background: Dr. Cohen trained and practiced as a physician in South Africa, before obtaining his PhD in 2007 in Medical Informatics at Columbia University. His doctoral work focused on an approach to enhancing clinical comprehension in the domain of psychiatry, leveraging distributed representations of psychiatric clinical text. Upon graduation, he joined the faculty at Arizona State University’s nascent Department of Biomedical Informatics, where he contributed to the development of curriculum for informatics students, as well as for medical students at the University of Arizona’s Phoenix camps. In 2009 he joined the faculty at the University of Texas School of Biomedical Informatics, where (amongst other things) he developed a NLM-funded research program concerned with leveraging knowledge extracted from the biomedical literature for information retrieval and pharmacovigilance, and contributed toward large-scale national projects such as the Office of the National Coordinator’s SHARP-C initiative, which supported a range of research projects that aimed at improving the usability and comprehensibility of electronic health record interfaces.

Research: Dr. Cohen’s research focuses on the development and application of methods of distributional semantics – methods that learn to represent the meaning of terms and concepts from the ways in which they are distributed in large volumes of electronic text. The resulting distributed representations (concept or word embeddings) can be applied to a broad range of biomedical problems, such as: (1) using literature-derived models to find plausible drug/side-effect relationships; (2) finding new therapeutic applications for known (drug repurposing); (3) modeling the exchanges between users of health-related online social media platforms; and (4) identifying phrases within psychiatric narrative that are pertinent to particular diagnostic constructs (such as psychosis). An area of current interest involves the application of neural language models to detect linguistic manifestations of neurological and psychiatric conditions.  More broadly, he is interested in clinical cognition – the thought processes through which physicians interpret clinical findings – and ways to facilitate these processes using automated methods.  

Speaker: Tian Kang, MA, MPhil (PhD Student) – Dr. Chunhua Weng’s Lab 

Title: Exploring the Synergy of Neural and Symbolic Methods for Understanding Free-text Medical Evidence

Abstract: Recent state-of-the-art results in NLP have been achieved predominantly by deep neural networks. However, their reasoning capabilities are still rather limited compared to symbolic AI when facing reading comprehension tasks. I propose Medical evidence Dependency (MD)-informed Attention, a Neuro-Symbolic model for understanding free-text medical evidence, such as clinical trial publications. One head in the Multi-Head Self-Attention model is trained to attend to Medical evidence Dependencies (MD) and pass linguistic and domain knowledge onto later layers (MD-informed). We integrated MD-informed Attention into BioBERT and evaluate on two public machine reading comprehension benchmarks for clinical trial publications. The integration of MD-informed Attention head improves BioBERT substantially in both benchmarks—as large as an increase of +30% in the F1 score—and achieves the new state-of-the-art performance. MD-informed Attention empowers neural reading comprehension models with interpretability and generalizability via reusable domain knowledge. Its compositionality can benefit any Transformer-based NLP models for reading comprehension of free-text medical evidence.

Speaker: Victor Rodriguez, MA, MPhil (MD/PhD Student) – Dr. Adler Perotte’s Lab

Title: Training Deep Generative Models with Partially Observed Data

Abstract: Most deep generative models (DGMs) require fully observed data to train. Yet, data routinely contain missing values. This incompatibility motivates the development of inference algorithms which assume only partially observed data at training time. In this talk, I will present on-going work developing such algorithms for DGMs (specifically, Variational Autoencoders) and discuss preliminary results using data for which the missingness mechanism is ignorable. I also propose extensions to a) handle non-ignorable missingness mechanisms, which are common in clinical data sets and b) model labels for supervised disease phenotyping tasks.

Speaker: Elliot G. Mitchell, MA, MPhil (PhD Student) – Dr. Lena Mamykina’s Lab

Title: Automated Conversational Health Coaching: Work in Progress

Abstract: There is a need for automated health coaching solutions to supporting individuals living with chronic conditions in making everyday nutrition decisions. My research explores methods to enable automated health coaching via conversational interactions, like chatbots. In this presentation I describe work in progress towards the necessary components of a health coaching chatbot including the need to assess users’ goal attainment automatically, to offer feedback to users on goal attainment, as well as to provide suggestions when users do not meet their goals. I propose a set of computational methods to achieve these aims including crowdsourcing, active sensing, attention, and clustering. This approach can lead to the development of an automated health coach with the potential to help individuals achieve their health goals over time.
Speaker: Eugene, Lucas, MD (Fellow) – Dr. Bruce Forman’s Lab
Title: Life as a Clinical Informatics Fellow 
Abstract: Dr. Lucas will present an introduction to the Clinical Informatics fellowship and provide an overview of several projects he has led and worked on including: [1] leading the integration of a 3rd party application with the EHR, [2] identifying and managing Living Status discrepancies in the EHR, and [3] the development/kick off of the “25 By 5: Symposium to Reduce Documentation Burden on U.S. Clinicians by 75% by 2025.”

Speaker: Dr. Manuel Rivas, DPhil – Stanford University

Title: Genomic prediction and inference from population-scale datasets 

Watch The Full Presentation Here

Abstract: Clinical laboratory tests are a critical component of the continuum of care and provide a means for rapid diagnosis and monitoring of chronic disease. In this study, we systematically evaluated the genetic basis of 35 blood and urine laboratory tests measured in 358,072 participants in the UK Biobank and identified 1,857 independent loci associated with at least one laboratory test, including 488 large-effect protein truncating, missense, and copy-number variants. We then causally linked the biomarkers to medically relevant phenotypes through genetic correlation and Mendelian Randomization. Finally, we developed polygenic risk scores (PRS) for each biomarker and built multi-PRS models using all 35 PRSs simultaneously. We assessed sex-specific genetic effects and find striking patterns for testosterone with marked improvements in prediction when training a sex-specific model. We found substantially improved prediction of incidence in FinnGen (n=135,500) with the multi-PRS relative to single-disease PRSs for renal failure, myocardial infarction, type 2 diabetes, gout, and alcoholic cirrhosis. Together, our results show the genetic basis of these biomarkers, which tissues contribute to the biomarker function, the causal influences of the biomarkers, and how we can use this to predict disease.

Bio:  Dr. Rivas is an Assistant Professor in the Department of Biomedical Data Science at Stanford University in Stanford, California. He has a Bachelor of Science in Mathematics from the Massachusetts Institute of Technology and a Doctor of Philosophy in Human Genetics from the Nuffield Department of Clinical Medicine at Oxford University where he was a Clarendon Scholar.  He also did additional training at the Broad Institute in Cambridge, Massachusetts where he led the Helmsley Inflammatory Bowel Disease Exome Sequencing Program to understand the genetic factors that contribute to ulcerative colitis and Crohn’s disease risk.


Speaker: Dr. Terika McCall, PhD, MPH, MBA – Yale University 

Title: mHealth for Mental Health: User-Centered Design and Usability Testing of a Mental Health Application to Support Management of Anxiety and Depression in African American Women

Abstract: African American women experience rates of mental illness comparable to the general population (20.6% vs. 19.1%); however, they significantly underutilize mental health services compared to their white counterparts (10.2% vs. 27.2%). Past studies exploring the use of smartphone mental health interventions to reduce anxiety or depressive symptoms revealed that participants experienced significant reduction in anxiety or depressive symptoms post-intervention. Since African American women are comfortable with participating in mHealth research and interventions, and 80% of African American women own smartphones, there is great potential to remedy the disparities in mental health service utilization by leveraging use of smartphones for information dissemination, and delivery of mental health services and resources. My talk will focus on user-centered recommendations for content and features that should be included in a smartphone application culturally-tailored to support management of anxiety and depression in African American women. I will also discuss the results of usability testing of an initial prototype of the app.

Bio: Dr. McCall is a National Library of Medicine Biomedical Informatics and Data Science Postdoctoral Fellow at Yale Center for Medical Informatics. Her research focuses on reducing disparities in mental health service utilization through use of technology. Dr. McCall’s research is interdisciplinary and focuses on issues related to the acceptance, design, development, and use of mHealth applications for mental wellness.

2020 Fall Seminars

Speaker: Tony Y. Sun, MA (PhD Student) – Dr. Noémie Elhadad’s Lab

Title: Systematically quantifying and analyzing the impact of time-to-diagnosis disparities on the diagnostic process

Brief Abstract: In recent healthcare literature, a number of studies have illuminated how sex and gender-based healthcare disparities contribute to differences in health outcomes [e.g. ten year mortality for women after the WISE study]. In this talk, I’ll be focusing on how we systematically quantified time-to-diagnosis disparities across phenotypes, and how we analyzed the impact of these disparities on the diagnostic process. Our quantification of time-to-diagnosis disparities showed that, for patients that would go on to enter the same phenotype at CUMC, women are consistently diagnosed later than men for the majority of the same presenting symptoms. To analyze the impact of these disparities on the diagnostic process, we trained gender-agnostic classifiers for each disease using patients’ presenting symptoms. We assessed how the fairness gap changes with incrementally changed amounts of data. Despite our earlier finding that women present with symptoms earlier than men, the majority of these gender-agnostic classifiers paradoxically performed better for men than for women.

Speaker: Linying Zhang, MS, MA (PhD Student) – Dr. George Hripcsak’s Lab

Title: Adjusting for Unobserved Confounding Using Large-Scale Propensity Score

Brief Abstract: Even though nowadays observational data can contain an enormous number of covariates, the existence of unobserved confounder still cannot be excluded and remains a major barrier to drawing causal inference from observational data. Recently, analyses using large-scale propensity score (LSPS) adjustment have demonstrated examples of adjusting for unobserved confounding by including hundreds of thousands of available covariates. In this paper, we present the conditions under which LSPS can reduce bias due to unobserved confounder. In addition, we show that LSPS does not adjust for various unwanted variables (e.g., M-bias colliders, instruments). We demonstrate the performance of LSPS on bias reduction using both simulations and real medical data.

Speaker 1: Amanda J. Moy, MPH, MA (PhD student) – Dr. Sarah Collins Rossetti’s (OPTACIMM) Lab

Title: Measuring clinical documentation burden among physicians and nurses: a review of the literature 

Abstract: Rapid adoption of electronic health records (EHRs) following the passage of the HITECH Act has led to advances in both individual- and population-level health. Largely still in its infancy, EHRs have also resulted in unintended consequences on clinical practice and healthcare systems, including significant increases in clinician documentation time. Extended work hours, time constraints, clerical workload, and disruptions to the patient-provider encounter, have led to a rise in discontent with existing documentation methods in EHR systems. This documentation burden (hereinafter referred to as “burden”) has been linked to increases in medical errors, threats to patient safety, inferior documentation quality, and ultimately, burnout among nurses and physicians. Few empirically-based readily-available solutions to reduce burden exist, and to our best knowledge, there is no consensus on the best approaches to measure burden. Furthermore, the concept of burden has been ill-defined and poorly operationalized. Achieving the three primary goals (cited in the 21st Century Cures Act) to reduce EHR-related clinician burdens that influence care will necessitate standardized, quantitative measurements to evaluate impact. The purpose of this scoping review is to assess the state of science, identify gaps in knowledge, and synthesize characteristics of burden measurement among physicians and nurses using EHRs.

Speaker 2: James Rogers, MS, MA, MPhil (PhD student) – Dr. Chunhua Weng’s Lab

Title: Comparison of trial participants and non-participants using electronic health record data

Abstract: Clinical trials are medical research studies in which participants are assigned to receive one or more interventions so that researchers can evaluate the interventions’ effects. They are quintessential for the development of medical evidence, but are susceptible to a variety of challenges. One such challenge is generalizability, which refers to the ability to apply the conclusions of a study to a different set of relevant patients outside the context of that study. Assessing generalizability of clinical trials is important because differences in underlying clinical characteristics can impact the estimated effect of the interventions, ultimately impacting their clinical meaningfulness. However, most contemporary assessments provide minimal granularity on clinical comparisons. In this presentation, I will explore an alternative approach that combines electronic health record (EHR) data with enrollment data from prior clinical trials, while also highlighting potential implications that emerge from the results of this study.

Title: Machine learning for mental healthcare: a human-centered approach

Abstract: Machine learning advances are opening new routes to more precise healthcare, from the discovery of disease subtypes for stratified interventions to the development of personalized interactions supporting self-care between clinic visits. This offers an exciting opportunity for machine learning techniques to impact healthcare in a meaningful way. Within the healthcare domain, machine learning for mental healthcare is an under-investigated area and yet a potentially highly impactful area of research. In this talk, I will present recent work on probabilistic graphical modeling to enable a more personalized approach to mental healthcare, whereby information can be aggregated from multiple sources within a unified modeling framework. We present a human-centered approach to mental healthcare which is aimed at increasing the effectiveness of psychological wellbeing practitioners.

Bio: Dr. Danielle Belgrave is a Principal Researcher Manager at Microsoft Research, in Cambridge (UK) in the Health Intelligence group where she leads Project Talia. She is particularly interested in integrating medical domain knowledge to develop probabilistic graphical models to develop personalized treatment strategies in health. Originally from Trinidad and Tobago, she received her BSc in Mathematics and Statistics from London School of Economics, an MSc in Statistics from University College London and her PhD in Machine Learning and Statistics for Healthcare from The University of Manchester where she was a Microsoft Research PhD scholar. Prior to joining Microsoft Research, she had a tenured faculty position at Imperial College London.

Saba Akbar
Australian Institute of Health Innovation
Macquarie University

Effects of automation on risk identification and nurses’ decision making

Watch The Recording Here 

Abstract: Electronic Decision Support Systems (DSS) can facilitate the five steps of the nursing care process (NCP): assessment, problem identification, planning, intervention, and evaluation. At each of these steps, nurses are required to process information and make complex decisions. DSS also present opportunities to support human information processing which can be broken down into four distinct functions – information acquisition, information analysis, decision selection and action implementation. For instance, to assess problem risks, nurses need to acquire information about patient’s history and physical health, analyze risk status, decide, and implement suitable management strategies. While current DSS have capacity to automate information analysis and decision selection, they require nurses to manually perform other tasks. In this project, we reviewed evidence on effects of automation in DSS on patient outcomes, care delivery and nurses’ decision making. Next, we interviewed nurses to explore their perceptions about existing DSS for risks assessments of falls and pressure injuries, which are among the top hospital acquired complications in Australia. Finally, we designed a simulated DSS that automates these risk assessments.

Due to the 2020 AMIA Conference, there was no seminar on Nov. 16.

Trey Ideker

Professor, Department of Medicine; Adjunct Professor, Departments of Bioengineering and Computer Science; Co-Director, Bioinformatics and Systems Biology PhD Program

University of California San Diego

Title: Interpreting the cancer genome through physical and functional models of the cancer cell

Abstract: Recently we and other laboratories have launched the Cancer Cell Map Initiative ( and have been building momentum. The goal of the CCMI is to produce a complete map of the gene and protein wiring diagram of a cancer cell. We and others believe this map, currently missing, will be a critical component of any future system to decode a patient’s cancer genome. I will describe efforts along several lines: 1. Coalition building. We have made notable progress in building a coalition of institutions to generate the data, as well as to develop the computational methodology required to build and use the maps. 2. Development of technology for mapping gene-gene interactions rapidly using the CRISPR system. 3. Causal network maps connecting DNA mutations (somatic and germline, coding and noncoding) to the cancer events they induce downstream. 4. Development of software and database technology to visualize and store cancer cell maps. 5. A machine learning system for integrating the above data to create multi-scale models of cancer cells. In a recent paper by Ma et al., we have shown how a hierarchical map of cell structure can be embedded with a deep neural network, so that the model is able to accurately simulate the effect of mutations in genotype on the cellular phenotype.

Dr. Ideker Bio: Dr. Ideker is a Professor in the Departments of Medicine, Bioengineering and Computer Science at UC San Diego. Additionally, he is the Director or Co-Director of the National Resource for Network Biology (NRNB), the Cancer Cell Map Initiative (CCMI), the Psychiatric Cell Map Initiative (PCMI), and the UCSD Bioinformatics PhD Program, and former Chief of Genetics in the Department of Medicine. He is a pioneer in using genome-scale measurements to construct network models of cellular processes and disease. The Ideker Laboratory seeks to create artificially intelligent models of cancer and other diseases for the translation of patient data to precision diagnosis and treatment. 

Due to Election Day, there was no seminar on Nov. 2.

Daniel Prieto-Alhambra

Prof. of Pharmaco– and Device Epidemiology, University of Oxford

Watch The Recording Here

Title: OHDSI-EHDEN Joint COVID-19 Collaboration: Global Real-World Data to Fight COVID-19 

Due to Columbia’s involvement with the 2020 OHDSI Symposium, there will be no seminar Oct. 19.

DBMI Student Town Hall

Steve Labkoff  

Watch The Recording Here

Title:  Real-world Informatics Challenges in Building a Real-World Oncology Registry: The Multiple Myeloma Research Foundation’s CureCloud Experience

Abstract: One of the biggest impediments to personalized medicine is having enough data about a given disease process to in order to explore that disease from multiple perspectives – such as genomics, EHR and immunologics.  In 2017, the Mulitple Myeloma Research Foundation, building on the previous successes of its CoMMpass Clinical Trial, sought to build a registry with 5-times the number of participants than it had in CoMMpass.  It took on a number of tenets that proved exceptionally challenging for this work including the desire to work directly with patients, return clinical genomic data to patients and their clinicians, and aggregate data from a large array of data sources.  In July 2020, the CureCloud Direct-to-Patient Registry opened for patient recruitment. After just 2 months, the registry has over 250 registrants. The challenges of getting this registry opened for recruitment demonstrates the numerous challenges in working across the US with “all comers”, the vast array of EHR vendors, standing up a new CLIA-validated bioinformatics pipeline, and getting the data ultimately returned to patients. This talk will discuss the many real-world challenges and solutions put into place in standing up this program from an informatics, regulatory, legal and clinical perspective.

Vimla Patel 

Watch The Recording Here

Title: Medical Expertise: Why and when is explanation needed?

Abstract: Since medical practice is a human endeavor, rapid technologic advances create a need to bridge disciplines to enable clinicians to benefit from them. In turn, this necessitates a broadening of disciplinary boundaries to consider cognitive and social factors related to the design and use of technology in the medical context.  My awareness of these issues began when I started investigating the development of models of medical expertise and the symbolic representation of medical knowledge in the late1980s. The last 30 years of multidisciplinary research on medical cognition in my laboratory have shown the remarkable importance of cognitive factors that determine how health professionals comprehend information, solve problems, and make decisions. These investigations into the process of medical reasoning have made significant contributions to the design of clinical AI systems. These systems offer great potential for progress to improve people’s health and well-being, but their adoption in clinical practice is still limited. A lack of transparency in these systems is identified as one of the main barriers to their acceptance. My talk will elaborate on what we have learned about how medical practitioners acquire, understand, explain, and utilize expertise, focusing on cognitive-psychological methods and frameworks.  It will also discuss how such work elucidates key lessons and challenges for the development of usable, useful, and safe decision-support systems to augment human intelligence in the clinical world.

Bio: Read more about Vimla here. Her web site is here

2020 Spring Seminars

Dr. Melanie Wall

Title: Predicting service use and functioning for people with first episode psychosis in coordinated specialty care (due to technology error, this video isn’t available, though Dr. Wall’s presentation slides are available here)

Abstract: A key initiative in research focused on treatment for first episode psychosis (FEP) is improving the implementation of evidence-based coordinated specialty care (CSC). One area of improvement is expected to come from improved data analytics facilitated by linking different clinical sites through common data elements and a unified informatics approach for aggregating and analyzing patient level data. The present study examines to what extent predictive modeling of patient-level outcomes based on background variables collected at intake and throughout care can be used to differentiate individuals in a way that is useful. Using data from 600 FEP patients from 15 different CSC sites, we will develop and compare several machine learning models for predicting multivariate, correlated outcomes across one year of care. Presentation of results will focus on interpretability of differential prediction across sites and usefulness for facilitating service decisions.

Bio: Melanie Wall is Professor of Biostatistics and Director of Mental Health Data Science (MHDS) in the New York State Psychiatric Institute (NYSPI) and Columbia University psychiatry department.  MHDS is made up of a team of 15 biostatisticians collaborating on predominately NIH (NIMH/NIH/NIAAA/NIDA) funded research projects related to psychiatry. She has worked extensively with modeling complex multilevel and multimodal data on a wide array of psychosocial public health and psychiatric research questions in both clinical studies and large epidemiologic studies (over 300 total journal publications). She is an expert in longitudinal data analysis and latent variable modeling, including structural equation modeling focused on mediating and moderating (interaction) effects where she has made many methodological contributions. She has a long track record as a biostatistical mentor for Ph.D. students and NIH K awardees and regularly teaches graduate level courses in the Department of Biostatistics in the Mailman School of Public Health attended by clinical Masters students, Ph.D. students, post-docs, and psychiatry fellows. Her current research mission is improving the accessibility and application of state-of-the-art and reproducible statistical methods across different areas psychiatric research. 

Oliver Bear Don’t Walk

TITLE: Comparing the Impact of Transfer Learning Between Clinical Care Institutions on Clinical Note Classification Tasks

ABSTRACT: Performing transfer learning with neural networks such as BERT, ELMo and GPT has lead to state-of-the-art results in the clinical domain on many natural language processing applications. Performing transfer learning with these kinds of models often includes task agnostic pre-training and then fine-tuning on a specific downstream task. However, previous work has found that pre-training at one institution and fine-tuning on a downstream task at another can lead to decreased performance on the downstream task. Differences between clinical institutions (e.g. patient population, documentation practices, clinical specialties, provider roles) can affect clinical corpus qualities and lead to intra-domain variation between institutions. Intra-domain variation could be a contributing factor to downstream task performance degradation when performing transfer learning across institutions. To the best of our knowledge, we present the first experiments focused on performing transfer learning with BERT models between two institutions and compare performance differences on downstream tasks at each institution. We confirm the previous finding that BERT performs better on downstream tasks at institutions it was most recently pre-trained at, which holds true for both institutions in our experiments. We also found that consecutive pre-training on clinical corpora further improves downstream task performance if the most recent pre-training corpus and downstream task corpus are from the same institution. This performance increase is at the expense of decreased performance on the previous institution’s downstream task corpus, a phenomenon known as catastrophic forgetting.

Shreyas Bhave

TITLE: Deep Survival Analysis: Regularization and Missingness with Non Parametric Survival Distributions

ABSTRACT: Survival analysis methods have long been used to effectively model time-to-event data. In the healthcare setting, the Framingham risk score is a salient use case in which 10-year risk of cardiovascular disease is estimated using a narrow set of clinical features. In order to use a more expanded set of clinical features from the EHR for survival analysis, a number of challenges must be addressed: (1) there is a high degree of missingness in EHR data (2) there is no natural event to align all the data (3) many nonlinear relationships likely exist between clinical features. Deep survival analysis (DSA) is an approach for addressing these issues by leveraging a deep conditional model of failure time. However, questions about how different levels and kinds of missingness affect out-of-sample prediction remain largely unexplored. Furthermore, the best approach for regularizing a model with such high capacity is empirically untested. We leverage extensions to this model which relax the distributional assumptions to fit a non-parametric survival distribution. Using this model, we run experiments on different methods of regularization and explore the effects of censorship as well as different types of missingness on model robustness. Initial results show promise with DSA outperforming baseline methods such as Cox regression. In the future, we hope to explore alternative methods of non parametric modeling (e.g. normalizing flows), simulate more clinically realistic scenarios of missingness and apply the model to EHR data from Columbia and NYU.

Dr. Jun Kong

Title: Multi-Dimensional Histopathology Image Analysis for Cancer Research

Abstract: In biomedical research, the availability of an increasing array of high-throughput and high- resolution instruments has given rise to large datasets of imaging data. These datasets provide highly detailed views of tissue structures at the cellular level and present a strong potential to revolutionize biomedical translational research. However, traditional human-based tissue review is not feasible to obtain this wealth of imaging information due to the overwhelming data scale and unacceptable inter- and intra- observer variability. In this talk, I will first describe how to efficiently process Two-Dimension (2D) digital microscopy images for highly discriminating phenotypic information with development of microscopy image analysis algorithms and Computer-Aided Diagnosis (CAD) systems for processing and managing massive in-situ micro-anatomical imaging features with high performance computing. Additionally, I will present novel algorithms to support Three-Dimension (3D), molecular, and time- lapse microscopy image analysis with HPC. Specifically, I will demonstrate an on-demand registration method within a dynamic multi-resolution transformation mapping and an iterative transformation propagation framework. This will allow us to efficiently scrutinize volumes of interest on-demand in a single 3D space. For segmentation, I will present a scalable segmentation framework for histopathological structures with two steps: 1) initialization with joint information drawn from spatial connectivity, edge map, and shape analysis, and 2) variational level-set based contour deformation with data-driven sparse shape priors. For 3D reconstruction, I will present a novel cross section association method leveraging Integer Programming, Markov chain based posterior probability modelling and Bayesian Maximum A Posteriori (MAP) estimation for 3D vessel reconstruction. I will also present new methods for multi-stain image registration, biomarker detection, and 3D spatial density estimation for For molecular imaging data integration. For time-lapse microscopy images, I will present a new 3D cell segmentation method with gradient partitioning and local structure enhancement by eigenvalue analysis with hessian matrix. A derived tracking method will be also presented that combines Bayesian filters with a sequential Monte Carlo method with joint use of location, velocity, 3D morphology features, and intensity profile signatures. Our proposed methods featuring by 2D, 3D, molecular, and time-lapse microscopy image analysis will facilitate researchers and clinicians to extract accurate histopathology features, integrate spatially mapped pathophysiological biomarkers, and model disease progression dynamics at high cellular resolution. Therefore, they are essential for improving clinical decisions, enhancing prognostic predictions, inspiring new research hypotheses, and realizing personalized medicine.

Bio: Dr. Kong is Associated Professor in Department of Mathematics and Statistics, and Department of Computer Science in Georgia State University, adjunct faculty in Department of Biomedical Informatics, Department of Computer Science, and Winship Cancer Institute at Emory University. Dr. Kong’s research interests focus on big imaging data analytics for modeling cancer diseases, multi-modal biomedical image analysis, computer-aided diagnosis, machine learning, computational biology, and large-scale translational bioinformatics with heterogeneous data integration and mining. His long-term research goal is to establish an interdisciplinary research program engaged with mathematicians, biostatisticians, computer scientists, biologists, pathologists, and oncologists, among other domains of experts, for computational disease characterization, accurate modeling analysis, and granular-resolution understanding of diseases with large-scale, multi-modal, and multi-scale biomedical data. 

Watch the presentation here

Dr. Olga Troyanskaya

Professor of Computer Science and the Lewis-Sigler Institute for Integrative Genomics, Princeton University

Title: The quest for deep knowledge – decoding the human genome with deep learning models 

Abstract:  A key challenge in medicine and biology is to develop a complete understanding of the genomic architecture of disease. Yet the increasingly wide availability of ‘omics’ and clinical data, including whole genome sequencing, has far outpaced our ability to analyze these datasets. Challenges include interpreting the 98% of the genome that is noncoding to identify variants that are functional and may lead to disease, detangling genomic signals regulating tissue-specific gene expression, mapping the resulting genetic circuits and networks in disease-relevant tissues and cell types, and, finally, integrating the vast body of biological knowledge from model organisms with observations in humans. I will discuss methods that address these challenges, and highlight their applications to neurodevelopment and neurodegenerative diseases.

Lisa Grossman

Title: Interventions to Increase Patient Portal Use in Vulnerable Populations: A Systematic Review

Abstract: Background: More than 100 studies document disparities in patient portal use among vulnerable populations. Developing and testing strategies to reduce disparities in use is essential to ensure portals benefit all populations.

Objective: To systematically review the impact of interventions designed to (1) increase portal use or predictors of use in vulnerable patient populations, or (2) reduce disparities in use.

Methods: A librarian searched Ovid MEDLINE, EMBASE, CINAHL, and Cochrane Reviews for studies published before September 1st, 2018. Two reviewers independently selected English-language research articles that evaluated any interventions designed to impact an eligible outcome. One reviewer extracted data and categorized interventions, and another assessed accuracy. Two reviewers independently assessed risk of bias.

Results: Out of 18 included studies, 15 (83%) assessed an intervention’s impact on portal use, 7 (39%) on predictors of use, and 1 (6%) on disparities in use. Most interventions studied focused on the individual (13 out of 26, 50%), as opposed to facilitating conditions, such as the tool, task, environment, or organization (SEIPS model). Twelve studies (67%) reported a statistically significant increase in portal use or predictors of use, or reduced disparities. Five studies (28%) had high or unclear risk of bias.

Conclusion: Individually-focused interventions have the most evidence for increasing portal use in vulnerable populations. Interventions affecting other system elements (tool, task, environment, organization) have not been sufficiently studied to draw conclusions. Given the well-established evidence for disparities in use and the limited research on effective interventions, research should move beyond identifying disparities to systematically addressing them at multiple levels.

Anna Ostropolets 

Title: The Data Consult Service: an opportunity to bring new evidence to the bedside.

Abstract:  Evidence-based medicine facilitates clinical care standardization, reduces medical care misuse and overuse and eventually leads to health care cost reduction and improvement in effectiveness and quality of care. On the other hand, current evidence has been reported to be inadequate or missing for specific clinical cases. Randomized clinical trials, which are the gold standard of clinical evidence, are often not generalizable to real-world patients and fail to include patients with multiple co-morbidities, patients who are pregnant, the elderly, and other vulnerable populations. On the other hand, a growing body of observational data, along with the continuing accumulation of practice-based evidence, has made new approaches to evidence generation available. We will present our first steps in developing a Data Consult Service – a clinical decision support tool that uses observational data to answer clinicians’ questions in real time. We will discuss our work on discovering potential areas of use and target groups for this tool as well as first answered questions and future work.

Fall 2019 Seminars

TITLE: Using Genetics to Address the Challenges of 21st Century Drug Development

BIO: Michael N. Cantor, MD, MA is Executive Director, Clinical Informatics, at the Regeneron Genetics Center. Currently his work focuses on developing and optimizing phenotypes from EHR and cohort data and linking them with genetic data to help discover new drug targets. Prior to Regeneron, he was Director of Clinical Research Informatics at New York University School of Medicine. As Director of Clinical Research Informatics, he was also the clinical director for NYULH’s DataCore, where his work focused on data management for clinical trials, using data from clinical systems to research, and advanced analytics. His research interests include integrating and standardizing social determinants of health-related data into the EHR, optimizing informatics tools for frontline clinicians, and providing self-service data access tools for researchers. During his previous tenure at NYU, Dr. Cantor was the Chief Medical Information Officer for the South Manhattan Healthcare Network of the New York City Health and Hospitals Corporation, based at Bellevue, and saw patients and precepted at the medical clinic there. Dr. Cantor completed his residency in internal medicine and informatics training at Columbia, has an M.D. from Emory University, and an A.B. from Princeton, and is an Associate Professor in the Department of Medicine at NYU School of Medicine. He currently sees patients weekly at Bellevue’s medicine clinic.

Speaker:  Jonathan Elias, MD, Clinical Informatics Fellow

Title:  A Day in the Life of a Clinical Informatics Fellow: CI Fellowship, Epic Together’s Mobile Messaging and Provider Team Project and the Epic Together Pre- & Post-Implementation Study

Abstract:  Per AMIA, Clinical Informatics (CI) is the application of informatics and information technology to deliver healthcare services. The CI Fellowship is a two-year ACGME accredited fellowship now being offered to one candidate a year through NYP CUMC, after completion of a medical residency. During this seminar, the fellowship structure and goals with example projects and research will be discussed.

A large area of focus of the fellowship is operational CI projects and academic research. Currently, Columbia University Medical Center (CUMC), NewYork-Presbyterian (NYP) and Weill Cornell Medical Center (WCM) are preparing to implement an enterprise-wide clinical information system, the EpicCare© Electronic Health Record (EHR). With the implementation of the EpicCare© EHR, there is an opportunity to improve, streamline and standardize role delineation, clinical communications and patient assignment across the EHR and secure mobile messaging platforms. The goals and processes associated with this project will be discussed.

Finally, a brief overview & update of the Epic Pre- & Post-Implementation Study will be explored. The overall purpose of this study is to evaluate clinical workflows, process efficiencies, EHR utilization, data quality and overall perceived system usability post implementation of Epic at NYP/CUMC/WCM compared to systems in place prior to Epic implementation. This project is comprised of three specific aims, outlined below, with associated high-level approach and metrics. Aim 1: Conduct pre-post time motion study focused in inpatient setting and outpatient setting (including emergency department) to identify documentation workflow and time changes after Epic EHR implementation. Aim 2: Conduct log-file analyses to measure process efficiencies, EHR utilization (e.g., documentation time), and EHR data quality metrics. Aim 3: Administer a survey to measure and compare health professionals’ perceived usability and satisfaction pre- and post-Epic implementation in the context of functionality to enhance the delivery of continuity of care and adaptation to new health information technology (HIT).


Speaker:  Jiayao Wang, PhD Student, Dr. Dennis Vitkup’s Lab

Title:  Contribution of recessive genotypes and common variants to autism spectrum disorder

Abstract:  Autism spectrum disorder (ASD) is a genetically heterogeneous condition, caused by a combination of rare de novo and inherited variants as well as common variants in at least several hundred genes. However, significantly larger sample sizes are needed to identify the complete set of genetic risk factors. Also, contribution from inherited variants needs to be further investigated. Here we present for SPARK ( of ~9K families with ASD, all consented online. Whole exome sequencing (WES) and genotyping data were generated for each family using DNA from saliva. With Exome sequencing data and a simple statistical framework, we show a week contribution from recessive genotypes, as well as several significant recessive genes leads to Autism such as EIF3F and RELN. With genotype array data, we performed GWAS with transmission disequilibrium test and calculated polygenic risk scores for SPATK families. We show that autism probands has a significant higher polygenic risk compared to their siblings and the risk was spread all over the genome rather only from significant loci. Contribution from recessive genotypes and common variants, together with rare inherited variants and de novo mutations from SPARK project will complete our understanding of genetics of Autism.

There was no seminar on Nov. 25.

No seminar due to the AMIA Symposium.

Video: Watch the presentation here

Title: Oops! I’m on the wrong patient: Evaluating System-Level Interventions for Preventing Wrong-Patient Electronic Orders

Bio: Dr. Adelman’s Patient Safety Research Program began with the development of the Wrong-Patient Retract-and-Reorder (RAR) Measure—a valid and reliable method of quantifying the frequency of wrong-patient orders placed in electronic ordering systems. The Wrong-Patient RAR measure was the first automated measure of medical errors and the first Health IT Safety Measure endorsed by the National Quality Forum. The RAR method identifies thousands of near-miss, wrong-patient errors per year in large health systems, enabling researchers to test interventions to prevent this type of error.

The Wrong-Patient RAR measure has been used to evaluate the effectiveness of patient safety interventions in several studies conducted in different electronic health record systems and clinical settings, including in the neonatal intensive care unit (NICU). The measure is the primary outcome measure for supported by the Agency for Healthcare Research and Quality (R21HS023704, R01HS024945) and the National Institute for Child Health and Human Development (R01HD094793). Additional research is underway to extend the RAR methodology to other types of errors, such as wrong-drug errors, and develop new health IT safety measures (R01HS024538).

Results of Dr. Adelman’s research led to national patient safety guidance, including a recommendation issued by the Office of the National Coordinator for Health Information Technology that healthcare organizations use the Wrong-Patient RAR measure to monitor the frequency of wrong-patient orders. Effective 2019, The Joint Commission will require that hospitals adopt a distinct newborn naming convention that incorporates the mother’s first name, based on studies by Adelman and colleagues.

Due to the Election Day holiday on Tuesday, there is no Seminar today.

This is a DBMI Student Town Hall.

Speaker: Alex Kitaygorodsky, PhD Student, Dr. Yufeng Shen’s Lab

Title: Identification of disease-causing genetic mutations based on machine learning and large genomic data sets

Abstract: More than 3% of young children are born with developmental disorders such as congenital heart disease (CHD), congenital diaphragmatic hernia (CDH), and autism spectrum disorder (ASD). Understanding the genetic causes of these conditions is critical to improve health care for these children and to push forward human developmental biology and neuroscience. Recently, high-throughput sequencing technologies have enabled generation of large-scale genomic data in genetic studies of these conditions. However, translating human data to knowledge is challenging due to an incomplete understanding of biology and a lack of sufficiently powerful analytical methods. My work aims to develop new computational methods based on powerful machine learning techniques to interpret genome sequencing data and identify disease-causing genetic variations. In this talk, I will focus specifically on the role of regulatory non-protein coding mutations in CHD, where we have found a substantial role of variants disrupting RNA binding protein (RBP) binding sites. RBPs oversee normal regulation of gene expression, at both the transcriptional and especially post-transcriptional stages, and so their disruption via mutation represents an important but under-studied noncoding action mechanism. To better understand the observed enrichment in these sites, we first modeled RNA binding protein processes with a robust convolutional neural network. Then, we designed a gradient boosting super-model to integrate predicted RBP binding scores with multimodal genomic data, allowing us to predict pathogenic RBP and gene regulation disruption caused by individual mutations. Finally, we applied our model back to Whole Genome Sequencing data of autism and CHD to find new disease risk genes and improve genetic diagnosis. In summary, we leveraged large genomic datasets with a sophisticated machine learning approach to better analyze sequencing data, advance genomic medicine, and aid our understanding of developmental disorder genetics.


Speaker: Sylvia Cho, PhD Candidate, Dr. Karthik Natarajan’s Lab

Title: Identifying data quality dimensions for wearable device data

Abstract: Patient-generated health data (PGHD) is one of the emerging biomedical data that is captured and recorded by patients outside clinical encounters. One of the major factors that facilitates the documentation of PGHD is the proliferated use of health tracking technologies. Among the different health tracking technologies, wearable device is unique in that individuals can continuously and objectively self-track their health in free-living conditions. As a byproduct of using wearable devices for self-tracking, the large volume of accumulated data and diverse data types have led to the interest of reusing these data for research purposes. However, there are concerns on the quality of device-generated data due to various reasons such as technical and human limitations. Therefore, assessing the quality of wearable data is essential before reusing the data for research. Data quality dimension is an important feature for data quality assessment as it provides guidance on what aspect of data quality should be assessed for the research task. While there are abundant studies on data quality dimensions for traditional clinical data such as the electronic health record data, there is a lack of understanding on the important data quality dimensions for wearable device data. In this study, we aim to identify the data quality dimensions considered to be important by researchers when analyzing wearable data, and to verify if an existing data quality framework can be applied to this type of data or if it needs to be modified. In this talk, I will discuss the methods we used to identify the dimensions and present preliminary results of the study.  

Video: Watch the presentation here

Title: Applications of Data Science and Machine Learning in Radiology and Cardiology

Abstract: The overall goal of our group is to leverage data-driven approaches to help improve patient outcomes. This talk will demonstrate examples of how are working toward this goal by leveraging large clinical datasets, data science and machine learning. Specific examples include: 1) using 46,583 clinically-acquired 3D computed tomography images of the brain to develop and implement a deep learning model to efficiently reprioritize radiology worklists for quicker diagnosis of intracranial hemorrhage; 2) using deep learning to analyze 723,754 echocardiographic videos of the heart to accurately predict patient mortality; 3) analyzing 2 million 12-lead electrocardiographic tracings from the heart to predict clinically relevant future events and 4) optimizing evidence-based care delivery for a population of >10,000 patients with heart failure using machine learning.

Bio: Dr. Fornwalt attended the University of South Carolina as an undergraduate in mathematics and marine science. He then worked in a free medical clinic for a year before starting an MD/PhD program at Emory and Georgia Tech. After finishing his degrees in 2010, he completed an internship in pediatrics at Boston Children’s Hospital before becoming an Assistant Professor at the University of Kentucky.

After four years on faculty in Kentucky, Dr. Fornwalt moved to Geisinger where he completed his diagnostic radiology residency and founded Geisinger’s Department of Imaging Science and Innovation, which focuses on data-driven approaches to improving patient outcomes. Dr. Fornwalt is also a practicing thoraco-abdominal radiologist and an active member of Geisinger’s Heart Institute.

Video: Watch the presentation here

Title: Integrative Analysis of Multi-view Data for Dimension Reduction and Prediction

Abstract: Multi-view data are data collected on the same set of samples but from different views/sources. They become increasingly common in modern biomedical studies. In this talk, I’ll introduce some recent developments of the integrative analysis of multi-view data, and present a new multivariate predictive model with application to a longitudinal study of aging.

Background: Multi-view data are data collected on the same set of samples but from different views/sources. They become increasingly common in modern biomedical studies. In this talk, I’ll introduce some recent developments of the integrative analysis of multi-view data, and present a new multivariate predictive model with application to a longitudinal study of aging.

Bio: Dr. Gen Li is devoted to developing new statistical learning methods for analyzing high dimensional biomedical data. He focuses on analyzing complex data with heterogeneous types that are collected from multiple sources. His methodological research interests include dimension reduction, predictive modeling, association analysis, and functional data analysis. He is also interested in genetics and bioinformatics. He is a consortium member of the NIH Common Fund program Genotype-Tissue Expression (GTEx) project, and contributes to the development of statistical methods for expression quantitative trait loci analysis in multiple tissues. He also has research interests in scientific domains including melanoma, microbiome, and urology research.

Video: Watch the presentation here

Title: Machine Learning in Healthcare

Abstract: In March of 2016, the AlphaGo computer program beat world champion (and human) Lee Sedol at the board game Go. The program’s success reflected the significant progress that machine learning research has made in recent years. However, AlphaGo was just one example of what can be achieved with machine learning. This talk will provide an overview of some of the techniques that are being used in machine learning today, as well as some recent and ongoing work by Google’s research teams to advance the applications of machine learning, particularly its role in biomedical research.  The talk will also discuss some of the unique challenges around applications in healthcare.  

Bio: Ming Jack Po MD, PhD is a product manager in Google Health, leading a number of its machine learning research projects as well as health care product teams.  Prior to joining Google, Jack spent a decade working in different capacities in areas related to medical devices and healthcare delivery.  Jack is currently a trustee of the Austen Riggs Center, a board member of El Camino Health Systems, a member of the National Library of Medicine Lister Hill’s Board of Scientific Counselors and a member of the ONC’s Interoperability Standards Priorities Task Force.  Jack received his MD and PhD from Columbia University, his bachelor’s degree in Biomedical Engineering, and Masters degree in Mathematics from Johns Hopkins University.

Speaker: Alexander Hsieh, PhD student

Title: Detection of mosaic single nucleotide variants in exome sequencing data and implications for congenital heart disease

Abstract: The contribution of somatic mosaicism, or genetic mutations arising after oocyte fertilization, to congenital heart disease (CHD) is not well understood. Further, the relationship between mosaicism in blood and cardiovascular tissue has not been determined. We developed a computational method, Expectation-Maximization-based detection of Mosaicism (EM-mosaic), to analyze mosaicism in exome sequences of 2530 CHD proband-parent trios. EM-mosaic detected 326 mosaic mutations in blood and/or cardiac tissue DNA. Of the 309 detected in blood DNA, 85/94 (90%) tested were independently confirmed. Twenty-five mosaic variants altered CHD-risk genes, affecting 1% of our cohort. Of these 25, 22/22 candidates tested were confirmed. Variants predicted as damaging had higher variant allele fraction than benign variants, suggesting a role in CHD. The frequency of mosaic variants above 10% mosaicism was 0.13/person in blood and 0.14/person in cardiac tissue. Analysis of 66 individuals with matched cardiac tissue available revealed both tissue-specific and shared mosaicism, with shared mosaics generally having higher allele fraction. We estimate that ~1% of CHD probands have a mosaic variant detectable in blood that could contribute to cardiac malformations, particularly those damaging variants expressed at higher allele fraction compared to benign variants. Although blood is a readily-available DNA source, cardiac tissues analyzed contributed ~5% of somatic mosaic variants identified, indicating the value of tissue mosaicism analyses.


Speaker: Michelle Chau, PhD student

Title: Developing a user-centered, machine learning approach to identify preferences for inspirational social media health-related images for young populations

Abstract: Nutrition interventions for adolescents and young adults (AYAs) increasingly rely on mobile platforms and social media. Most assume nutritional decisions are rational, targeting intentions such as goal setting and self-monitoring. However, in the absence of motivation and time, nutrition choices are often automatic and based on heuristics. The use of images is a simple way to deliver heuristic messaging. My preliminary research showing AYAs frequent use of social media for inspiration, further suggests health-related images may be suitable for nutrition interventions with these groups. Previous studies have explored inspirational social media content using qualitative and manual methods. However, there is an active area of research in computational visual analysis that explores preferences and prediction for image retrieval and recommendation tasks. The application of these techniques within health and specifically how to translate human preferences into the technical requirements needed to identify inspirational images for nutrition and young populations is underexplored. In this talk, I will discuss a study to identify image features that are relevant for inspiring healthy eating in health-related social media content. Further, I will discuss future directions for exploring how these features may be incorporated into machine learning models.