The DBMI seminar series is a 1-credit course for DBMI students who can benefit from hearing new methods of research from speakers from both academia and industry. Enrollment is restricted to DBMI students, but anybody may attend the seminars. It is currently being offered virtually, though it is traditionally held in PH20-200.
DBMI also hosts a Special Seminar Series: Toward Diversity, Equity, and Inclusion in Informatics, Health Care, and Society. Both upcoming presentations and past recordings will be shared on our Special Seminar Series homepage.
2022 Fall Seminars
Title: Using Machine Learning to Increase Equity in Healthcare and Public Health
To join, please use this log-in
Meeting ID: 981 0245 9573 Passcode: 495614
Abstract: Our society remains profoundly unequal. Worse, there is abundant evidence that algorithms can, improperly applied, exacerbate inequality in healthcare and other domains. This talk pursues a more optimistic counterpoint — that data science and machine learning can also be used to illuminate and reduce inequality in healthcare and public health — by presenting vignettes about women’s health, COVID-19, and pain.
Bio: Emma Pierson is an assistant professor of computer science at the Jacobs Technion-Cornell Institute at Cornell Tech and the Technion, and a computer science field member at Cornell University. She holds a secondary joint appointment as an Assistant Professor of Population Health Sciences at Weill Cornell Medical College. She develops data science and machine learning methods to study inequality and healthcare. Her work has been recognized by best paper, poster, and talk awards, an NSF CAREER award, a Rhodes Scholarship, Hertz Fellowship, Rising Star in EECS, MIT Technology Review 35 Innovators Under 35, and Forbes 30 Under 30 in Science. Her research has been published at venues including ICML, KDD, WWW, Nature, and Nature Medicine, and she has also written for The New York Times, FiveThirtyEight, Wired, and various other publications.
Title: Multimodal deep learning for protein engineering
To join, please use this log-in
Meeting ID: 981 0245 9573 Passcode: 495614
Abstract: Engineered proteins play increasingly essential roles in industries and applications spanning pharmaceuticals, agriculture, specialty chemicals, and fuel. Machine learning could enable an unprecedented level of control in protein engineering for therapeutic and industrial applications. Large self-supervised models pretrained on millions of protein sequences have recently gained popularity in generating embeddings of protein sequences for protein property prediction. However, protein datasets contain information in addition to sequence that can improve model performance. This talk will cover pretrained models that use both sequences, structures, and annotations to predict protein function or to generate functional protein sequences.
Bio:Kevin Yang is a senior researcher at Microsoft Research in Cambridge, MA who works on problems at the intersection of machine learning and biology. He did his PhD at Caltech with Frances Arnold on applying machine learning to protein engineering. Before joining MSR, he was a machine learning scientist at Generate Biomedicines, where he used machine learning to optimize proteins. Before graduate school, Kevin taught math and physics for three years at a high school in Inglewood, California through Teach for America.
Specific information about this session will be available closer to the seminar.
To join, please use this log-in
Meeting ID: 981 0245 9573
Specific information about this session will be available closer to the seminar.
Meeting ID: 981 0245 9573
Specific information about this session will be available closer to the seminar.
Meeting ID: 981 0245 9573
Specific information about this session will be available closer to the seminar.
Meeting ID: 981 0245 9573
Specific information about this session will be available closer to the seminar.
Meeting ID: 981 0245 9573
Specific information about this session will be available closer to the seminar.
Meeting ID: 981 0245 9573
Specific information about this session will be available closer to the seminar.
Meeting ID: 981 0245 9573
Specific information about this session will be available closer to the seminar.
Meeting ID: 981 0245 9573
Previous 2022 Fall Seminars
Title: Does Social Media Support or Worsen Mental Well-Being? Well, It Depends
Abstract: Social media platforms continue to shape our identities, accruing important roles in our lives as they pertain to connecting with loved ones, finding like-minded peers, or finding an outlet to vent and broadcast small and big happenings around us. Much has been written in the media about these uses, but importantly, about the impacts of social media on a variety of outcomes, ranging from issues of political polarization to social justice. Is social media good or bad when it comes to mental well-being? This talk will present some critical evidence towards answering this question through a series of interlinked studies. In a first study, a large-scale observational study will situate how social support received online can help to reduce suicidal thoughts. Turning to negative impacts, a second study, using a computational causal approach, will describe the alarming ways misinformation on social media can aggravate stress and anxiety. Beyond these examples, finally, I will discuss how, eventually, in many cases, the answer to this question simply depends on the context. Specifically, anchoring on two studies that adopt a human-centered mixed methods approach, I will highlight the potential benefits and risks of social media use related to substance misuse disclosures, and to patients’ social reintegration efforts following a major psychiatric episode. Ultimately, regardless of the specific platforms, online social technologies are here to stay, and I will conclude by reflecting on possible implications that harness the positive uses and those that seek to mitigate the harmful effects of social media on mental well-being.
Bio: Munmun De Choudhury is an Associate Professor of Interactive Computing at Georgia Tech. Dr. De Choudhury is best known for laying the foundation of a new line of research that develops computational techniques towards understanding and improving mental health outcomes, through ethical analysis of social media data. To do this work, she adopts a highly interdisciplinary approach, combining social computing, machine learning, and natural language analysis with insights and theories from the social, behavioral, and health sciences. Dr. De Choudhury has been recognized with the Web Science Trust’s 2022 Test of Time Award, 2021 ACM-W Rising Star Award, 2019 Complex Systems Society – Junior Scientific Award, numerous best paper and honorable mention awards from the ACM and AAAI, and features and coverage in popular press like the New York Times, the NPR, and the BBC. Dr. De Choudhury currently serves on the Board of Directors of the International Society for Computational Social Science and on the Steering Committee of the International Conference on Web and Social Media, the leading conference on interdisciplinary studies of social media. Earlier, Dr. De Choudhury was a postdoc at Microsoft Research and obtained her PhD in Computer Science from Arizona State University.
Title: An algorithmic safety view of learning in health
This session was not recorded.
Abstract: Machine Learning advances have revolutionized many domains such as machine translation, complex game playing, and scientific discovery. On the other hand, ML has only enjoyed modest successes in health. To improve the utility, reliability, and robustness of Machine Learning (ML) models in health and medicine, we need to address several foundational challenges. In this talk, I will demonstrate how an algorithmic-safety perspective can motivate specific technical challenges for learning in healthcare. Specifically, I will discuss the need to improve the utility of ML-robustness, explainability with an emphasis on decision-making, and post-hoc algorithmic safety to prevent harm. I will discuss my contributions on i) aiding safe decision-making in non-IID settings using time-series explainability intended to address clinicians’ requirements, ii) novel learning algorithms to optimize for safety in sequential decision-making settings, and iii) methods to improve causal robustness of ML methods designed for practical generative settings. I will conclude with an overview of a research vision on novel safety-based objectives in ML for health, expanding ML-based solutions to practical generative settings, and outlining novel ways of validating ML models targeting safety-based objectives.
Bio: Shalmali Joshi is a Postdoctoral Fellow at the Center for Research on Computation and Society at Harvard University, and an incoming assistant professor at Columbia DBMI. Previously, she was a Postdoctoral Fellow at the Vector Institute. She received her Ph.D. from the University of Texas at Austin (UT Austin). Her research is on the algorithmic safety of Machine Learning for human-centered domains. Shalmali has contributed to the field of explainability, robustness, and novel algorithms for ML safety with an emphasis on practical generative settings and impact on decision-making. Shalmali has published in ML and inter-disciplinary venues in healthcare such as NeurIPS, FAccT, CHIL, MLHC, PMLR, and perspectives in JAMIA, LDH, and Nature Medicine. She has co-founded the Fair ML for Health NeurIPS workshop, General Chair for ML4H 2022, and Program Chair for MLHC 2022.
2022 Spring Seminars
Title: The Electronic Medical Records and Genomics (eMERGE) Genomic Risk Assessment and Management Network – Challenges and Opportunities
Speaker: Cong Liu, PhD – Associate Research Scientist, Department of Biomedical Informatics, Columbia University
Abstract: eMERGE is a national consortium, organized by the NHGRI, that conducts discovery and clinical implementation research in genomics and genomic medicine at research institutions across the country. Established in 2007,eMERGE research combines DNA biorepositories with electronic health record (EHR) systems for large-scale, high-throughput genetic studies. In this talk, I will introduce the resources and infrastructure has been established for the eMERGE network as well as potential research opportunities. During the past phases, the network has generated and maintained the clinical and genetic data for ~135,000 unique participants, which includes electronic phenotypes, genotyping array, exome sequencing, whole genome sequencing, pharmacogenomics, and an ACMG 59 emphasized custom panel. During the current phase, the network is charged with developing Genome Informed Risk Assessments (GIRA) for common complex diseases such as breast cancer and chronic kidney disease. GIRA is designed to combine genotyping for polygenic risk score (PRS), sequencing of monogenic genes, family health history, and clinical data. The network will validate the accuracy and the utility of GIRA by conducting a prospective study with a plan to recruit ~25,000 individuals focused on underrepresented populations, across a wide range of ages. The network will also explore how to integrate GIRA into the EHR and return the risk assessment along with care recommendations to both participants and their providers.
Bio: Dr. Cong Liu is an Associate Research Scientist at the Department of Biomedical Informatics at Columbia University. Dr. Liu’s research resides in the areas of genomics and informatics tools innovation. His research focuses on developing and applying novel informatics methods for genetic disorders diagnosis and risk prediction, as well as facilitating the implementation of genomic medicine using the electronic health record systems. Dr. Liu received his B.S. in Biological Science from the Fudan University, M.S. in Mathematics from University of Illinois at Chicago, Ph.D. in Bioinformatics from University of Illinois at Chicago. He later joined the Columbia University and completed his Post-Doctoral training at the Department of Biomedical Informatics.
Speaker: Tal Korem, PhD – Assistant Professor, Departments of Systems Biology and Obstetrics & Gynecology, Columbia University
Title: The vaginal microbiome and metabolome in spontaneous preterm birth
Seminar is not posted at request of the presenter
Abstract: The paired analysis of the microbiome and metabolome is revolutionizing our mechanistic understanding of microbial ecosystems. Analyzing vaginal microbial and metabolites data from samples collected early in pregnancy, we identified novel interactions with preterm birth. We propose that several preterm-birth-associated metabolites may be exogenous, and investigate the sources of another using metabolic models. We further show that the metabolome can accurately predict the risk for preterm delivery. Altogether, our results demonstrate the potential of vaginal metabolites as early biomarkers of sPTB and highlight exogenous exposures as potential risk factors for prematurity.
Bio: Tal Korem’s research program focuses on computational methods that identify and interpret host-microbiome interactions in various clinical settings, and specifically those related to women’s health. He has developed several new approaches for microbiome data analysis, inferring microbial growth rates, structural variants, and microbiome-metabolite interactions; and has applied these methods in diverse clinical and biological investigations, most notably for personalization of dietary treatment for normalizing glycemic responses. He is an Assistant Professor in the Departments of Systems Biology and Obstetrics & Gynecology at Columbia University.
Title: Achieving TechQuitySpeaker: Cheryl Clark MD, ScD – Associate Chief for Equity Research & Strategic Partnerships, Division of General Medicine and Primary Care, Brigham and Women’s Hospital; Assistant Professor of Medicine, Harvard Medical School Seminar not recorded at request of presenter
Abstract: Open discussions of social justice and health inequities may be an uncommon focus within information technology science, business, and health care delivery partnerships. However, the COVID-19 pandemic—which disproportionately affected Black, indigenous, and people of color—has reinforced the need to examine and define roles that technology partners should play to lead anti-racism efforts through our work. In this hour, we will discuss the imperative to prioritize TechQuity, and addressing social contexts in the implementation of AI and other technologies.
Bio: Cheryl Clark MD, ScD, is an Assistant Professor of Medicine at Harvard Medical School and a Hospitalist, social epidemiologist and Associate Chief in the Brigham and Women’s Hospital Division of General Medicine and Primary Care for Equity Research & Strategic Partnerships. Dr. Clark’s research focuses on social determinants of cardiometabolic health in diverse and aging populations. She is principal investigator for community engagement in the New England hub of the National Institutes of Health All of Us Research Program and chaired the social determinants of health (SDOH) Task Force that developed the SDOH participant provided information survey for All of Us. Dr. Clark serves on the Mass General Brigham Predictive Analytics committee to provide equity review of algorithms considered for clinical implementation. Dr. Clark chaired the COVID-19 equity response team during the early phase of the COVID-19 pandemic in 2020. She is the inaugural recipient of the Equity, Social Justice and Advocacy Award from Harvard Medical School and Harvard School of Dental Medicine.
Title: Racial and Ethnic Differences in Genetic Testing Uptake and Results among Young Breast Cancer Survivors: Looking Ahead at Future Work
Speaker: Tarsha Jones, Assistant Professor of Nursing, Florida Atlantic University
Seminar not recorded at request of presenter
Abstract: Genetic testing for hereditary breast and ovarian cancer (HBOC) syndrome (e.g., BRCA1/2 genes) is recommended for all young women diagnosed with breast cancer at ≤ age 45, yet there is an underutilization of this critical test among this population. In this presentation, I will provide an overview of the current landscape of genetic testing and discuss my program of research that focuses on racial and ethnic differences in genetic testing uptake and results among young breast cancer survivors (YBCS). In addition, I will provide an overview of my current and future work including our innovative web-based decision aid intervention, RealRisks, that we are adapting for racially/ethnically diverse young breast cancer survivors in order to increase access to genetic testing and family risk communication. A special emphasis is placed on promoting health equity and reducing cancer health disparities.
Bio: Dr. Jones is an Assistant Professor of Nursing at the Christine E. Lynn College of Nursing at Florida Atlantic University. She obtained a Bachelor’s of Science in Nursing degree from Seton Hall University and a Master’s of Science in Nursing degree from the Catholic University of America with a specialization in community/public health nursing and the care of immigrants, refugees, and global health. She holds a certification as an advanced public health nurse (PHNA-BC). She obtained a Doctor of Philosophy (PhD) in Nursing degree from Duquesne University and completed a post-doctoral research fellowship at Dana Farber Cancer Institute and Harvard Medical School.
Her research focuses on cancer prevention and control, risk-communication, and risk-reduction. Her current work focuses on improving uptake of genetic testing for breast cancer risk (i.e., BRCA1/2 genes and multigene panel testing) through culturally appropriate interventions, to facilitate informed decision-making for cancer risk-reducing strategies, and to promote family risk communication among young breast cancer survivors and their at-risk family members, with a particular emphasis on Black and Hispanic women. Her research is supported by the National Institute of Health (NIH) and the DAISY Foundation.
Speaker: Lena Mamykina, PhD – Associate Professor of Biomedical Informatics
Title: Do People Engage Cognitively with AI? Impact of AI Assistance on Incidental Learning
Abstract: Introduction of AI-powered systems in many domains of human life often rests on the assumption that humans can use their common sense, domain knowledge and experience, and critical thinking to examine AI output and to decide whether to act on it or to dismiss it. This is particularly the case in such critical domains as health and medicine. But is this assumption really justified and do people in fact critically examine AI-generated output? In this talk I will describe results of several experiments conducted on Lab in the Wild, a popular online platform for psychological and behavioral experiments, that specifically examined individuals’ cognitive engagement with AI-powered decision support and the role of explanations in facilitating this engagement. We consider learning gains as evidence of cognitive engagement and show that explanations can indeed lead to a deeper engagement with AI. However, the design of decision support and placement of explanations within the decision making process play a critical role in their impact. I conclude with analysis of implications for future AI-powered decision support tools.
Bio: Dr. Lena Mamykina is an Associate Professor of Biomedical Informatics at the Department of Biomedical Informatics at Columbia University. Dr. Mamykina’s research resides in the areas of Biomedical Informatics, Human-Computer Interaction, Ubiquitous and Pervasive Computing, and Computer-Supported Collaborative Work. Her research focuses on the design of innovative interactive systems in health that incorporate machine learning and AI. Dr. Mamykina received her B.S. in Computer Science from the Ukrainian State University of Maritime Technology, M.S. in Human Computer Interaction from the Georgia Institute of Technology, Ph.D. in Human-Centered Computing from the Georgia Institute of Technology, and M.A. in Biomedical Informatics from Columbia University. Prior to joining DBMI as a faculty member, she completed a National Library of Medicine Post-Doctoral Fellowship at the department.
Speaker: Undina Gisladottir, Ph.D. Student – Dr. Nicholas Tatonetti’s Lab
Title: Propensity Scores Improve the Performance of Self Controlled Case Series Studies using Electronic Health Records
Abstract: Randomized control trials are the gold standard for determining the safety and efficacy of a drug. However, the strict exclusion criteria for such trials can lead to unforeseen adverse drug events (ADEs) when released to the general public. For this reason, post-market surveillance is essential to ensure physicians can make informed decisions when prescribing. A self-controlled case series study using observational data, such as electronic health records (EHR), is an effective approach to identifying ADEs, as it controls for time-invariant confounders such as sex and race/ethnicity. However, ascertainment bias in EHR leads to inherent differences between the ‘risk’ and ‘baseline’ periods, which results in greater false positives. Some groups use negative controls to adjust the relative risk but this can be time-consuming and requires expert knowledge. In this study, we propose using interval-specific propensity scores to adjust for the bias between risk and baseline periods. We applied our method to an ADE prediction task using 370 known drug-event pairs from a reference ADE set using NYP CUIMC hospital (~16K patients) and validated in MarketScan’s Medicare dataset (~1.5M patients). We found that using the interval-specific propensity score significantly increased coverage and decreased bias. Our results show that propensity scores may reduce the effect of ascertainment bias in SCCS studies using observational data, enabling more reliable drug safety estimates.
Bio: Undina Gisladottir is a third-year Ph.D. student in Dr. Nicholas Tatonetti’s lab. Her current research uses electronic health records to further our understanding of drug effects and adverse drug events. Prior to joining DBMI, Undina completed her bachelor’s in biomedical engineering at Boston University and her master’s in biomedical informatics at HMS where she conducted research with Dr. Nils Gehlenborg and Dr. Chirag Patel.
Speaker: Harry Reyes Nieva, Ph.D. Student – Dr. Nóemie Elhadad’s Lab
Title: Mining the Health Disparities and Minority Health Bibliome
Abstract: Lack of a large-scale survey of the health disparities and minority health (HDMH) literature leaves the field potentially vulnerable to disproportionately focus on specific populations or emphasize certain conditions, curtailing our ability to fully advance health equity and improve our understanding of the health of minoritized communities. We propose using scalable methods to characterize trends and isolate potential gaps and blind spots in HDMH research. To support investigators in navigating the HDMH bibliome, we are also actively developing HDMH Monitor, an interactive dashboard and article repository.
Using a pre-validated MEDLINE/PubMed search strategy, we extracted HDMH articles (~250K in total) and their meta-data via the open-source MEDLINE API. We employed a three-pronged approach scalable to the entire corpus. To characterize HDMH literature, we identified: (1) studied populations and study designs using Medical Subject Headings (MeSH); (2) conditions mentioned in abstracts and titles using clinical named-entity recognition (CNER); and (3) emerging topics of study through probabilistic topic modeling (i.e., latent Dirichlet allocation). To characterize the HDMH bibliome further, we compared trends in studied conditions to relative condition prevalence in large claims datasets (42+ million Americans).
Large-scale analysis yields insights about trends in HDMH research: half (50%) of all HDMH articles concerned just three International Classification of Diseases (ICD) chapters (cancer, mental health, endocrine/metabolic disorders); disease prevalence in the general population was not necessarily indicative of HDMH research foci; and disease coverage in the literature was highly variable among minoritized populations. Notable temporal trends among topics include increased focus on community-based research; decreased focus on economic policy and medical education; and emergence of nascent topics like sexual and gender minority health. Our approach employs scalable methods for processing, characterizing, and monitoring an ever-increasing body of literature systematically. Leveraging ontologies and CNER enables top-down assessment of studied conditions and, by extension, those not well represented across populations, while topic modeling allows for a bottom-up identification of emerging themes. Common terminology (ICD) allows for direct comparison across data sources.
Bio: Harry Reyes Nieva is a third year Ph.D. student in Dr. Noémie Elhadad’s lab. His current research primarily aims to use and expand the vast toolbox that computational methods offer to better understand, improve, and facilitate the study of health in underserved communities and advance health equity. Harry received his B.A. from Yale University and Master of Applied Science from the Johns Hopkins Bloomberg School of Public Health. Prior to starting his Ph.D., Harry was a member of the MTERMS lab led by Dr. Li Zhou at Harvard Medical School/Mass General Brigham and the Strategic Information division of the U.S. President’s Emergency Plan for AIDS Relief (PEPFAR) at Harvard, which aimed to rapidly expand treatment and care programs for people living with HIV/AIDS in Botswana, Nigeria, and Tanzania.
• Dr. Ashley Beecy, Assistant Professor at Weill Cornell Medicine and NYP Hospital
• Dr. Salvatore Crusco, Clinical Informatics Fellow at Columbia University Hospital
• Jennifer Beirne, MHA, MA CPHIMS, Director at People & Organization Development team at Columbia University
Dr. Ashley Beecy is an Assistant Professor of Medicine in the Department of Medicine, Division of Cardiology at Weill Cornell Medicine. She serves as the Clinical Lead for IT Transformation and Advanced Analytics at NewYork Presbyterian. Her research is focused on digital health including the implementation of artificial intelligence (AI) and the use of AI to study cardiovascular imaging.
Dr. Salvatore Crusco is a second-year clinical informatics fellow at NYP/DBMI with a keen interest in clinical decision support (CDS). Sal has worked with the CDS workgroup to develop a sub-committee, the CDS Optimization Workgroup, which meets weekly to discuss optimization efforts for alerts that are non-intuitive, untimely, interruptive, non-actionable, and continually re-firing. Most of these efforts are geared toward reducing alert fatigue for users while prioritizing patient care.
Jennifer Beirne oversees the Optimization track for the People & Organization Development team at ColumbiaDoctors. She and her team work with stakeholders across the institution to apply a structured approach to improving workflows and user proficiency within the EHR. Prior to joining her current team, she helped support CUIMC’s Epic implementation as part of ColumbiaDoctors’ Office of the CMIO. Jennifer completed DBMI’s Certification of Professional Achievement in HIT in 2017.
Speaker: Adrienne Pichon, PhD Student – Dr. Noemie Elhadad’s Lab
Title: Informing the Design of Individualized Self-management Regimens from the Human, Data, and Algorithmic Perspectives
Abstract: Self-management is critical to care of chronic illness, but developing a personalized self-management regimen that works for an individual often requires a lengthy and frustrating trial-and-error process. Personal health informatics solutions could augment this experimentation process by leveraging artificial intelligence, specifically reinforcement learning (RL). This talk presents a mixed-methods study that addresses both technical and human challenges that remain in translating promising computational methods to a complex, real-world setting.
We use “in the wild” self-tracking data from the Phendo app alongside conversations with users to assess the feasibility of a tool in the context of endometriosis. Data from 10,463 users, detailing their personal experience of illness (eg, symptoms) and self-management (eg, physical activity), are used to characterize the breadth and patterns of self-management strategies used in practice and to quantify population and individual effects. Qualitative analysis of transcripts from prior focus groups (10 groups, n=48) and follow-up interviews (n=3) represents the end-user perspective. We integrate results across methods to map the boundaries and constraints at the intersection of computational and human viewpoints.
Findings suggest that user engagement patterns and data availability are sufficient for RL requirements. Users confirm that they want this type of support and are willing to experiment with a broad range of strategies. Both data and human perspectives affirm that personally tailored solutions are necessary, despite substantial heterogeneity. Design recommendations include promoting control and autonomy, incorporating context, and enabling explainability.
Bio: Adrienne Pichon is a third year PhD student in Dr. Noémie Elhadad’s lab. Her current research focuses on supporting the needs of patients and their care teams in complex and uncertain chronic illness contexts. Adrienne received her MPH from Columbia University’s Mailman School of Public Health, and contributed to research both at Mailman and the School of Nursing before coming to DBMI.
Speaker: Yiwie Sun, PhD Student – Dr. Harris Wang’s Lab
Title: Discovery of pathogen-inhibitory commensal gut microbiota by high-throughput culturomics.
Abstract: Vancomycin-resistant Enterococcus (VRE) can densely colonize intestines and cause bloodstream infections in people who have received antibiotic-mediated treatments and consequently suffer from the loss of commensal microbiota. Fecal Matter Transplant (FMT) has been shown to be able to efficiently clear VRE from the gut, but it remains unclear which species in particular play a role in clearance of VRE. Herein, we demonstrated that key bacterial strains can directly inhibit VRE growth and clear VRE from mouse intestines. By implementing a high-throughput strain isolation and culturation system, we isolated >2300 isolates from ICU patients as well as healthy human individuals and screened for inhibitory effects against VRE in vitro. Candidate strains were shown to inhibit VRE growth in vitro and eliminate VRE in mouse infection models. Furthermore, we discovered key metabolites produced by these strains that explain the mechanism of VREgrowth inhibition. These findings suggest that probiotic therapy using the candidate strain may reduce VRE-related inter-patient transmission and promote recovery of native commensal microbiota.
Bio: Yiwei Sun is a third year PhD student in Dr. Harris Wang’s lab. Her current research focuses on examining the relationship between gut microbiome and intestinal diseases. Prior to PhD, she received her B.S. in Microbiology, Immunology, and Molecular Genetics from UCLA where she conducted research with Dr. Grace Xiao.
Speaker: Katie Brown, PhD student – Dr. Nicholas Tatonetti’s lab
Title: Estimating the heritability of SARS-CoV-2 susceptibility and COVID-19 severity
Abstract: Over 340 million people have been infected with SARS-CoV-2 since its discovery in 2019. Pharmaceutical companies continue to search for effective therapeutics to counter COVID-19. While genetic studies have the potential to highlight relevant biological pathways and drug targets, understanding the overall heritability of SARS-CoV-2 susceptibility and COVID-19 severity is important for contextualizing their results and prioritizing future studies. To date, associated loci are estimated to explain <1% of variation in patient susceptibility and severity. In this talk, I will discuss our approach to estimating the importance of shared environment and genetics to SARS-CoV-2 susceptibility and COVID-19 severity.
Speaker: Michael Zietz, PhD student – Dr. Nicholas Tatonetti’s lab
Title: Estimated genetic liability as a proxy phenotype for GWAS
Abstract: Deciphering the genetic architecture of complex disease is a major challenge in biomedical research and one that would simplify the search for new preventions, treatments, and cures. The genetic contributions to complex traits and diseases arise from thousands of genetic variants, most of which have only small effects. While major biobank projects have enabled the estimation of many small effects through the collection of very large cohorts, nonetheless statistical power remains a challenge for variant effect estimation. Many complex traits and diseases have shared genetic contributions, manifesting in both genetic and phenotypic correlations. Various traits, therefore, contain predictive information about a patient’s genetic risk for a trait of interest. We developed a method to estimate patient-level genetic liabilities for a trait of interest using a deeply phenotyped cohort and summary information such as trait heritabilities and trait genetic correlations. Preliminary results suggest that using the estimated genetic liability of a trait as a proxy in a genome-wide association study leads to greater power to detect variant effects. We are currently expanding our use of the new method to larger sets of traits, in order better to evaluate its strengths and limitations. Our goal is to produce a method which can provide a better understanding of complex trait architecture using fewer samples than existing methods.
2021 Fall Seminars
Title: Prediction-driven surge planning with applications in the emergency department
Abstract: Optimizing emergency department (ED) nurse staffing decisions to balance the quality of service and staffing cost can be extremely challenging, especially when there is a high level of uncertainty in patient-demand. Increasing data availability and continuing advancements in predictive analytics provide an opportunity to mitigate demand-rate uncertainty by utilizing demand forecasts. In this work, we study a two-stage prediction framework that is synchronized with the base (made months in advance) and surge (made nearly real-time) staffing decisions in the ED. We quantify the benefit of the more expensive surge staffing. We also propose a near-optimal two-stage staffing policy that is straightforward to interpret and implement. Lastly, we develop a unified framework that combines parameter estimation, real-time demand forecasts, and staffing in the ED. High fidelity ED simulation experiments demonstrate that the proposed framework can reduce staffing costs by 8% – 17% while guaranteeing timely access to care. Joint work with Jing Dong and Yue Hu.
Bio: Carri W. Chan is a Professor of Business in the Decision, Risk and Operations Division and the Faculty Director of the Healthcare and Pharmaceutical Management Program at Columbia Business School. Her research is in the area of healthcare operations management. Her primary focus is in data-driven modeling of complex stochastic systems, efficient algorithmic design for queuing systems, dynamic control of stochastic processing systems, and econometric analysis of healthcare systems. Her research combines empirical and stochastic modeling to develop evidence-based approaches to improve patient flow through hospitals. She has worked with clinicians and administrators in numerous hospital systems including Northern California Kaiser Permanente, New York Presbyterian, and Montefiore Medical Center. She is the recipient of a 2014 National Science Foundation (NSF) Faculty Early Career Development Program (CAREER) award, the 2016 Production and Operations Management Society (POMS) Wickham Skinner Early Career Award, and the 2019 MSOM Young Scholar Prize. She currently serves as a co-Department Editor for the Healthcare Management Department at Management Science. She received her BS in Electrical Engineering from MIT and MS and PhD in Electrical Engineering from Stanford University.
Talk title: Are phenotyping algorithms fair for underrepresented minorities within older adults?
Abstract: The widespread adoption of machine learning (ML) algorithms for risk-stratification has unearthed plenty of cases of racial/ethnic biases within algorithms. When built without careful weightage and bias-proofing, ML algorithms can give wrong recommendations, thereby worsening health disparities faced by communities of color. Biases within electronic phenotyping algorithms are largely unexplored. In this work, we look at probabilistic phenotyping algorithms for clinical conditions common in vulnerable older adults: dementia, frailty, mild cognitive impairment, Alzheimer’s disease, and Parkinson’s disease. We created an experimental framework to explore racial/ethnic biases within a single healthcare system, Stanford Health Care, to fully evaluate the performance of such algorithms under different ethnicity distributions, allowing us to identify which algorithms may be biased and under what conditions. We demonstrate that these algorithms have performance (precision, recall, accuracy) variations anywhere between 3 to 30% across ethnic populations; even when not using ethnicity as an input variable. In over 1,200 model evaluations, we have identified patterns that indicate which phenotype algorithms are more susceptible to exhibiting bias for certain ethnic groups. Lastly, we present recommendations for how to discover and potentially fix these biases in the context of the five phenotypes selected for this assessment.
Bio: Dr. Juan M. Banda at his GSU lab, Panacea Lab, works on building machine learning, and NLP methods that help to generate insights from multi-modal large-scale data sources, with applications to precision medicine, medical informatics, as well as other domains. His research interests are not limited to structured data, he is also well-versed in extracting terms and clinical concepts from millions of unstructured electronic health records and using them to build predictive models (electronic phenotyping) and mine for potential multi-drug interactions (drug safety). Dr. Banda’s has published over 70 peer reviewed conference and journal papers and serves as an editorial board member of the Journal of the American Medical Informatics and Frontiers in Medicine – Translational Medicine, and a reviewer for JBI, nature Digital Medicine, nature Scientific Data, nature Protocols, PLOS One, and several other leading journals. Prior to being an assistant professor of Computer Science at Georgia State University, Dr. Banda was a postdoctoral scholar, then a research scientist at Stanford’s center of Biomedical Informatics. He is an active collaborator of the Observational Health Data Sciences and Informatics, and his work has been funded by the Department of Veteran Affairs, National Institute of Aging as well as NASA, NSF and NIH, and serves as a PC member and chair for several conferences and workshops including ICML, NeurIPS, FLAIRS, IEEE Big Data, among others.
Speaker: Linying Zhang, PhD Student
Title: Algorithmic fairness in medicine: A case study in glomerular filtration rate (GFR) prediction
Abstract: The appropriate use and the implications of using variables that attempt to encode a patient’s race in medical predictive algorithms remains unclear. One example of an algorithm that includes a race variable is the equation for estimating glomerular filtration rate (GFR), an indicator of kidney function used to classify the severity of chronic kidney disease (CKD). However, the observed difference between Black and non-Black participants lacks biologically substantiated evidence. A recent study showed that removing race as a variable from the estimated GFR equation could have a significant impact on recommended care for Black patients (e.g., increasing CKD diagnoses among Black adults could improve access to specialist care and kidney transplantation). However, they did not study whether removing the race modifier leads to more accurate GFR predictions for Black patients. Recently, many algorithmic fairness definitions have been proposed and studied in domains such as education, economics and criminal justice, but their applicability to medical predictive algorithms has not been well explored. We examined the appropriateness of various algorithmic fairness definitions in the context of understanding the impact of race on GFR prediction in terms of model performance and fairness. We consider the use case of drug dosing, in which the difference between the true GFR and the calculated GFR will be relevant.
Title: Predictive modeling for self-tracking apps: a case study in menstrual health
Abstract: Self-tracking apps provide a rich source of health observations that hold the promise to characterize underlying physiological state and disease trajectories, as well as to support users in self-managing their health. But these data streams can also be unreliable since they hinge on user adherence to the app. In this talk, I will focus on menstrual trackers, a highly popular type of self-tracking technology. I will present our ongoing work on characterizing variability in menstrual cycle within and across individuals and building models that predict next cycle date all the while accounting for skipped tracking data.
Bio: Noémie Elhadad is an Associate Professor of Biomedical Informatics, affiliated with Computer Science and the Data Science Institute at Columbia University. She serves as Biomedical Informatics Vice Chair of Research and Graduate Program Director. Her research is at the intersection of machine learning, technology, and medicine.
Title: Multimorbidity Patterns Across Race/Ethnicity Stratified by Age and Obesity: A Cross-sectional Study of a National US Sample
Objectives: The objective of our study is to assess differences in prevalence of multimorbidity by race.
Methods: We applied the FP-growth algorithm on middle-aged and elderly cohorts stratified by race, age, and obesity level. We used 2016-2017 data from the Cerner HealthFacts® Electronic Health Record data warehouse. We identified disease combinations that are shared by all races/ethnicities, those shared by some, and those that are unique to one group for each age/obesity level.
Results: Our findings demonstrate that even after controlling for age and obesity, there are differences in multimorbidity prevalence across races. There are multimorbidity combinations distinct to some racial groups—many of which are understudied. Some multimorbidities are shared by some but not all races. African Americans presented with the most distinct multimorbidities at an earlier age.
Discussion: The identification of prevalent multimorbidity combinations amongst subpopulations provides information specific to their unique clinical needs.
- Collecting multiple partial solutions
- Synthesizing partial solution into multiple prototypes
- Quickly iterating on prototypes to produce an MVP
Title: Towards High-Quality Structured Data from Clinical Notes
Abstract: The real-world evidence found in electronic health records contain the scale of data required for more personalized medicine, from heterogeneous treatment effect estimation to disease progression modeling. Unfortunately, many of the variables needed for such research (treatment information, comorbidities, disease stage) are found not in structured data, but trapped within clinical notes. Due to the messiness of free-text notes and the sparsity of labels, clinical information extraction can be challenging in practice; tasks as fundamental as clinical concept normalization remain largely unsolved. In this talk, I will present machine learning solutions that can operate with minimal labeled data by leveraging unlabeled data and humans-in-the-loop. However, ultimately, it would be ideal if clinical notes were easier to parse to begin with. I will describe our efforts, in collaboration with and piloted at Beth Israel Deaconess Medical Center, to reimagine the process of clinical documentation to facilitate and incentivize the creation of high-quality data at the point-of-care.
Bio: Monica Agrawal is a 4th year PhD student at MIT CSAIL in the Clinical Machine Learning Group, advised by David Sontag. Her research revolves around synthesis of longitudinal clinical notes and the creation of smarter electronic health records. She previously received a BS/MS from Stanford University in computer science. She is supported by a Takeda fellowship.
Student Presentations do not get recorded.
Title: What the CONCERN Study Has Taught Us About Racial Bias in Nursing Workflow
Abstract: Early detection of patient deterioration in the hospital is a clinically significant issue. Our team has built a clinical decision system called CONCERN (Communicating Narrative Concerns Entered by RNs). The CONCERN study leverages big data analytic techniques to increase interdisciplinary shared situational awareness for patients at risk of decompensation using clinically relevant information that may otherwise be missed by the care team. CONCERN uses nursing surveillance patterns to risk stratifying patients for deterioration to support clinical decision-making. This multi-site (Columbia University and Brigham Women’s Hospital) project is currently evaluation Ing the relationship between CONCERN uses and patient outcomes, inpatient mortality, and length of stay, using a clustered randomized control trial. CONCERN is the first NIH (National Institute of Health) funded study to evaluate a nurse-driven machine learning-based clinical decision support system with a randomized clinical trial. My presentation will present an overview of our project, the infrastructure of our intervention, lessons learned about racial bias in these data, and proposed future work.
Bio: Kenrick Cato, PhD, RN, CPHIMS, FAAN, is an Assistant Professor Columbia University School of Nursing, and Columbia University Vagelos School of Physicians and Surgeons Department of Emergency Medicine. Dr. Cato has a varied background. He worked at NewYork-Presbyterian Health system as a surgical and medical oncology staff nurse and as an analyst in the information technology department, working on projects to improve patient safety through the use of Clinical decision support. In the analyst position, he focused on projects to improve patient safety through the optimization of the hospital’s electronic systems. Dr. Cato’s program of research focuses on the mining of electronic patient data to support clinical decision making. His previous work includes National Institute of Health-funded research in health communication via mobile health platforms, shared decision making in primary care settings and data mining of electronic patient records. His current projects include automated data mining of electronic patient records to discover patient characters that are often missed and the development of predictive models for inpatient clinical deterioration.
Title: Machine Learning Applications in Cardiology
Abstract: In this talk we will discuss why and how deep learning approaches have the potential to greatly impact cardiac imaging. We will then explore use cases developed here at Columbia that have led to two of the world’s first prospective clinical trials of deep learning in cardiology. Lastly we’ll critique the limitations of current ML approaches preventing mainstream adoption in order to answer the question, “What are the big problems the field needs to be tackling now?” (and maybe even answer, “What’s a really good idea for me to do research on as a grad student?”)
Bio: Pierre Elias, MD is a cardiology fellow at Columbia University Irving Medical Center who recently completed a two-year postdoc in the Perotte Lab at DBMI.
Title: Towards a unified systems theory of mental disorders
Abstract: Understanding the biology of psychiatric disorders requires analyses on multiple levels of hierarchical organization: on the level of genes, cellular networks, neuron types, brain circuits, and patient phenotypes. Over the last decade, our lab has pioneered advances on all these organizational levels, for disorders such as autism and schizophrenia. We believe that the emerging data now allows to make an informed generalization about the etiology of major psychiatric disorders. Using examples primarily from autism spectrum disorder (ASD), I will discuss our recent work on understanding brain circuits that are likely perturbed across disorders. We have recently developed an approach to integrate genetic data with high-resolution spatial gene expression and brain-wide mesoscale connectome. The application of the approach to autism demonstrates that ASD mutations perturb widely distributed sets of brain circuits with interrelated biological functions and structures from the cortex, striatum, amygdala, thalamus and hippocampus. The identified circuits are generally responsible for the integration of sensory and emotional information as well as context-dependent learning and decision-making based on this information. Our preliminary analyses show that similar circuits are also affected in schizophrenia and likely in many other mental disorders. We have also discovered that each ASD gene can be characterized by a parameter, phenotype dosage sensitivity (PDS), which quantifies the relationship between changes in a gene’s dosage and changes in each disorder phenotype. We believe that the relationship characterized by PDS is likely to generalize to other disorders and human phenotypes. Finally, I will discuss how the emerging picture puts us on the path towards explaining the common genetic risk factor underlying multiple psychiatric disorders (p-factor) and how specific phenotypes may arise in each disorder.
2021 Spring Seminars
Speaker: Rafael Irizarry, PhD Professor and Chair of the Department of Data Sciences at the Dana-Farber Cancer Institute; Professor of Biostatistics at Harvard T.H. Chan School of Public Health
Title: Probabilistic Gene Expression Signatures for Single Cell RNA-seq Data
Abstract: In this talk Prof. Irizarry will describe his general approach to developing statistical solutions to problems in high throughput biology. He will focus on an example related to predicting cell types from single cell RNA-seq data. He will discuss challenges such as batch effects and sparse data and describe statistical solutions for these. Finally, he will show recent results from a collaboration involving spatial transcriptomics.
Biography: Rafael Irizarry received his Bachelor’s in Mathematics in 1993 from the University of Puerto Rico and went on to receive a Ph.D. in Statistics in 1998 from the University of California, Berkeley. His thesis work was on Statistical Models for Music Sound Signals. He joined the faculty of the Johns Hopkins Department of Biostatistics in 1998 and was promoted to Professor in 2007. He is now Professor and Chair of the Department of Data Sciences at the Dana-Farber Cancer Institute and a Professor of Biostatistics at Harvard T.H. Chan School of Public Health.
Professor Irizarry’s work has focused on applications in genomics. In particular, he has worked on the analysis and signal processing of high-throughput data. He has distinguished himself by disseminating his statistical methodology as open source software shared through the Bioconductor Project, a leading open source and open development software project for the analysis of high-throughput genomic data. His widely downloaded software tools have helped him become one of the most highly cited scientists in his field. Although Professor Irizarry’s focus has been in genomics, he is an applied statistician generally interested in read-world problems. During his career he has co-authored papers on a variety of topics including musical sound signals, infectious diseases, circadian patterns in health, fetal health monitoring, and estimating the effects of Hurricane María in Puerto Rico.
Professor Irizarry’s dedication to education is best demonstrated by the success of the numerous trainees he has mentored. He has also developed several HarvardX online courses on data analysis, which have been completed by thousands of students. These courses are divided into three series: Professional Certificate in Data Science, Data Analysis for the Life Sciences and Genomics Data Analysis. He shares the material for these courses through textbooks that are freely available online and reproducible code through GitHub. Professor Irizarry also dedicates his time providing service to the profession. Examples of this work include serving as the chair of the Genomics, Computational Biology and Technology Study Section (GCAT) National Institute of Health (NIH) study section, the search committee for the National Library of Medicine director, the National Academy of Sciences Gulf War and Health Committee, and the National Advisory Council for Human Genome Research.
Professor Irizarry has received several awards honoring the work described above. In 2009, the Committee of Presidents of Statistical Societies (COPSS) named him the Presidents’ Award winner. The Presidents’ Award is arguably the most prestigious award in Statistics. That year he was also named a fellow of the American Statistical Association. In 2017 the members of Bioinformatics.org chose Professor Irizarry the laureate of the Benjamin Franklin Award in the Life Sciences. In 2020 he became an ISCB Fellows. He has also received the 2019 Research Parasite Award for outstanding contributions to the rigorous secondary analysis of data, the 2009 Mortimer Spiegelman Award which honors an outstanding public health statistician under age 40, the ASA Youden Award in Interlaboratory Testing, the 2004 American Statistical Association (ASA) Outstanding Statistical Application Award, and the 2001 American Statistical Association Noether Young Scholar Award for researcher younger than 35 years of age who has significant research accomplishments in nonparametrics statistics.
Title: Identifying and Leveraging Public Data Sources with Social Determinants of Health Information for Population Health Informatics Research
Speaker: Irene Dankwa-Mullan MD MPH, Chief Health Equity Officer, IBM Watson Health, IBM Corporation
Abstract: Social determinants of health (SDOH) account for many health inequities. Data sources traditionally used in informatics research often lack SDOH, and, when available, SDOH may be difficult to leverage given it’s lack of specificity and lack of structured information. In this presentation, I will share the initial phases of work that we are doing around leveraging SDoH data – for health equity research – addressing some of the informatics challenges leveraging social determinants of health data to inform population health or inform health services research. I will discuss a case study using a machine learning clustering algorithm to uncover region-specific sociodemographic features and disease-risk prevalence correlated with COVID-19 mortality during the early accelerated phase of community spread.
Bio: Irene Dankwa-Mullan is nationally and internationally recognized physician and expert scientist working at the intersection of healthcare, health equity, public health, informatics, data science and applied artificial intelligence with over 60-peer-reviewed publications. She serves as the Chief Health Equity Officer and Deputy Chief Health Officer for research and evaluation at IBM Watson Health. As Chief Health Equity Officer, she works across business market segments to promote a culture of equity, ethical AI, diversity and inclusion. Her responsibilities as Deputy Chief Health Officer includes leadership for evaluation research and implementation science and promoting opportunities to advance the science of AI and advanced analytics. Dr. Dankwa-Mullan attended Barnard College where she majored in Biochemistry. She received her medical degree from Dartmouth Medical School, and a Master’s degree in Infectious Disease Epidemiology and Biostatistics from the Yale School of Public Health in a joint MD/MPH program. She completed residency training in Internal Medicine at the Johns Hopkins Hospital’s Bayview medical campus.
Watch The Presentation Here
Title: Digital Phenotyping: Quantifying human health with low, medium and high frequency data streams
Abstract: Digital health data is notoriously enigmatic. However, smartphones, wearables, and EEGs have the potential to provide enormous insight into human health and wellbeing. Making sense of these complex data streams requires new computational approaches that combine the best of signal processing and machine learning to find pragmatic solutions. Dr. Sathyanarayana will discuss challenges and solutions for translating low, medium and high frequency data into actionable insights for health, wellness, and performance.
Bio: Dr. Aarti Sathyanarayana is a postdoctoral research fellow in the department of biostatistics at the Harvard T.H. Chan School of Public Health. She also holds an appointment in the clinical data animation center at Massachusetts General Hospital and Harvard Medical School. Her research interests are in time variant health data analysis, signal processing, and machine learning. She strives to translate enigmatic health data into actionable insights, with an emphasis on digital phenotyping and digital biomarker discovery. Her recent work has focused on developing new methodologies to better understand smartphone, wearables, and EEG data in the context of human health and wellness. Prior to joining Harvard, Aarti received her PhD in computer science from the University of Minnesota, where her dissertation was selected for the university’s doctoral dissertation award. Since then, her work has won multiple junior investigator awards from the National Center of Women and Information Technology, the American Medical Informatics Association, the American Epilepsy Society, and the American Clinical Neurophysiology Society. Her expertise has also led her to hold positions at Apple, Intel, the Mayo Clinic, and Boston Children’s Hospital.
Speaker: Carlos Bustamante, PhD
Title: Why doing the right thing and diversifying clinical trials can unleash innovation in biopharma pipelines
Abstract : Clinical Genetics Lacks Standard Definitions and Protocols for the Collection and Use of Diversity Measures. More: https://www.cell.com/ajhg/pdfExtended/S0002-9297(20)30152-X
Short bio: For the past 18 years, I have led a multidisciplinary team working on problems at the interface of computational and biological sciences. Much of our research has focused on genomics technology and its application in medicine, agriculture, and evolutionary biology. My first academic appointment was at Cornell University’s College of Agriculture and Life Sciences. There, much of our work focused on population genetics and agricultural genomics motivated by a desire to improve the foods we eat and the lives of the animals upon which we depend. I moved to Stanford in 2010 to focus on enabling clinical and medical genomics on a global scale. I have been focused on reducing health disparities in genomics by: (1) calling attention to the problem raised by >95% of participants in large scale studies being of European descent; and (2) broadening representation of understudied groups in large NIH funded consortia, particularly minority groups from the U.S., the Americas, and Africa. My work has empowered decision-makers to utilize genomics and data science in the service of improving human health and wellbeing. In the next phase of my career, I will focus on opportunities for bringing these technologies to consumers and patients, directly, where this work can have the greatest impact. I have a strong interest in building new academic units, non-profits, and companies. I was the Inaugural Chair of the Department of Biomedical Data Science—the first new department that Stanford has started in 14 years—and I was Founding Director (with Marc Feldman) of the Center for Computational, Evolutionary, and Human Genomics. I serve as an advisor to the US federal government, private companies, startups, and non-profits in the areas of computational genomics, population and medical genetics, veterinary and plant genomics, and business strategy.
Speaker: Megan Threats, PhD, MSLIS
Speaker: Trevor Cohen, MBChB, PhD, FACMI
Title: Using Neural Language Representations to Detect Linguistic Anomalies in Neurodegenerative and Psychiatric Disease
Abstract: Language is uniquely positioned in mental health as both a focus of observation for clinical signs and symptoms, and a medium through which some forms of therapy are delivered. Alzheimer’s Disease and other forms of dementia can also affect language production, for example by limiting access to more specific terms that describe the world in detail. In both cases, data from speech and text are increasingly available on account of the use of digital devices to mediate research and healthcare delivery. Neural language representations such as word embeddings, recurrent neural network language models, and contemporary transformer architectures have become a predominant point of focus in computational linguistics research. The models from which these representations are derived are typically trained on large amounts of unlabeled text, with training tasks involving predicting held-out terms that occur in proximity to observed ones. During the course of such training, much information about the typical use of language is learned. This information is of potential value for the detection of the atypical usage that may characterize certain clinical conditions. In this talk I will discuss our recent work in this area, with a focus on two areas of application: (1) a study of the responsiveness of deep neural networks that distinguish between responses to cognitive tasks from participants with and without Alzheimer’s Disease to known deficiencies in language production in this condition; and (2) the application of neural word embeddings to model language coherence in order to detect the disorganized thinking characteristic of episodes of psychosis in schizophrenia and other conditions. I will also more briefly touch on a range of related ongoing work involving efforts to model constructs that are of diagnostic or therapeutic importance in mental health.
Background: Dr. Cohen trained and practiced as a physician in South Africa, before obtaining his PhD in 2007 in Medical Informatics at Columbia University. His doctoral work focused on an approach to enhancing clinical comprehension in the domain of psychiatry, leveraging distributed representations of psychiatric clinical text. Upon graduation, he joined the faculty at Arizona State University’s nascent Department of Biomedical Informatics, where he contributed to the development of curriculum for informatics students, as well as for medical students at the University of Arizona’s Phoenix camps. In 2009 he joined the faculty at the University of Texas School of Biomedical Informatics, where (amongst other things) he developed a NLM-funded research program concerned with leveraging knowledge extracted from the biomedical literature for information retrieval and pharmacovigilance, and contributed toward large-scale national projects such as the Office of the National Coordinator’s SHARP-C initiative, which supported a range of research projects that aimed at improving the usability and comprehensibility of electronic health record interfaces.
Research: Dr. Cohen’s research focuses on the development and application of methods of distributional semantics – methods that learn to represent the meaning of terms and concepts from the ways in which they are distributed in large volumes of electronic text. The resulting distributed representations (concept or word embeddings) can be applied to a broad range of biomedical problems, such as: (1) using literature-derived models to find plausible drug/side-effect relationships; (2) finding new therapeutic applications for known (drug repurposing); (3) modeling the exchanges between users of health-related online social media platforms; and (4) identifying phrases within psychiatric narrative that are pertinent to particular diagnostic constructs (such as psychosis). An area of current interest involves the application of neural language models to detect linguistic manifestations of neurological and psychiatric conditions. More broadly, he is interested in clinical cognition – the thought processes through which physicians interpret clinical findings – and ways to facilitate these processes using automated methods.
Speaker: Tian Kang, MA, MPhil (PhD Student) – Dr. Chunhua Weng’s Lab
Title: Exploring the Synergy of Neural and Symbolic Methods for Understanding Free-text Medical Evidence
Abstract: Recent state-of-the-art results in NLP have been achieved predominantly by deep neural networks. However, their reasoning capabilities are still rather limited compared to symbolic AI when facing reading comprehension tasks. I propose Medical evidence Dependency (MD)-informed Attention, a Neuro-Symbolic model for understanding free-text medical evidence, such as clinical trial publications. One head in the Multi-Head Self-Attention model is trained to attend to Medical evidence Dependencies (MD) and pass linguistic and domain knowledge onto later layers (MD-informed). We integrated MD-informed Attention into BioBERT and evaluate on two public machine reading comprehension benchmarks for clinical trial publications. The integration of MD-informed Attention head improves BioBERT substantially in both benchmarks—as large as an increase of +30% in the F1 score—and achieves the new state-of-the-art performance. MD-informed Attention empowers neural reading comprehension models with interpretability and generalizability via reusable domain knowledge. Its compositionality can benefit any Transformer-based NLP models for reading comprehension of free-text medical evidence.
Speaker: Victor Rodriguez, MA, MPhil (MD/PhD Student) – Dr. Adler Perotte’s Lab
Title: Training Deep Generative Models with Partially Observed Data
Abstract: Most deep generative models (DGMs) require fully observed data to train. Yet, data routinely contain missing values. This incompatibility motivates the development of inference algorithms which assume only partially observed data at training time. In this talk, I will present on-going work developing such algorithms for DGMs (specifically, Variational Autoencoders) and discuss preliminary results using data for which the missingness mechanism is ignorable. I also propose extensions to a) handle non-ignorable missingness mechanisms, which are common in clinical data sets and b) model labels for supervised disease phenotyping tasks.
Speaker: Elliot G. Mitchell, MA, MPhil (PhD Student) – Dr. Lena Mamykina’s Lab
Title: Automated Conversational Health Coaching: Work in Progress
Speaker: Dr. Manuel Rivas, DPhil – Stanford University
Title: Genomic prediction and inference from population-scale datasets
Abstract: Clinical laboratory tests are a critical component of the continuum of care and provide a means for rapid diagnosis and monitoring of chronic disease. In this study, we systematically evaluated the genetic basis of 35 blood and urine laboratory tests measured in 358,072 participants in the UK Biobank and identified 1,857 independent loci associated with at least one laboratory test, including 488 large-effect protein truncating, missense, and copy-number variants. We then causally linked the biomarkers to medically relevant phenotypes through genetic correlation and Mendelian Randomization. Finally, we developed polygenic risk scores (PRS) for each biomarker and built multi-PRS models using all 35 PRSs simultaneously. We assessed sex-specific genetic effects and find striking patterns for testosterone with marked improvements in prediction when training a sex-specific model. We found substantially improved prediction of incidence in FinnGen (n=135,500) with the multi-PRS relative to single-disease PRSs for renal failure, myocardial infarction, type 2 diabetes, gout, and alcoholic cirrhosis. Together, our results show the genetic basis of these biomarkers, which tissues contribute to the biomarker function, the causal influences of the biomarkers, and how we can use this to predict disease.
Bio: Dr. Rivas is an Assistant Professor in the Department of Biomedical Data Science at Stanford University in Stanford, California. He has a Bachelor of Science in Mathematics from the Massachusetts Institute of Technology and a Doctor of Philosophy in Human Genetics from the Nuffield Department of Clinical Medicine at Oxford University where he was a Clarendon Scholar. He also did additional training at the Broad Institute in Cambridge, Massachusetts where he led the Helmsley Inflammatory Bowel Disease Exome Sequencing Program to understand the genetic factors that contribute to ulcerative colitis and Crohn’s disease risk.
Speaker: Dr. Terika McCall, PhD, MPH, MBA – Yale University
Title: mHealth for Mental Health: User-Centered Design and Usability Testing of a Mental Health Application to Support Management of Anxiety and Depression in African American Women
Abstract: African American women experience rates of mental illness comparable to the general population (20.6% vs. 19.1%); however, they significantly underutilize mental health services compared to their white counterparts (10.2% vs. 27.2%). Past studies exploring the use of smartphone mental health interventions to reduce anxiety or depressive symptoms revealed that participants experienced significant reduction in anxiety or depressive symptoms post-intervention. Since African American women are comfortable with participating in mHealth research and interventions, and 80% of African American women own smartphones, there is great potential to remedy the disparities in mental health service utilization by leveraging use of smartphones for information dissemination, and delivery of mental health services and resources. My talk will focus on user-centered recommendations for content and features that should be included in a smartphone application culturally-tailored to support management of anxiety and depression in African American women. I will also discuss the results of usability testing of an initial prototype of the app.
Bio: Dr. McCall is a National Library of Medicine Biomedical Informatics and Data Science Postdoctoral Fellow at Yale Center for Medical Informatics. Her research focuses on reducing disparities in mental health service utilization through use of technology. Dr. McCall’s research is interdisciplinary and focuses on issues related to the acceptance, design, development, and use of mHealth applications for mental wellness.
2020 Fall Seminars
Speaker: Tony Y. Sun, MA (PhD Student) – Dr. Noémie Elhadad’s Lab
Title: Systematically quantifying and analyzing the impact of time-to-diagnosis disparities on the diagnostic process
Brief Abstract: In recent healthcare literature, a number of studies have illuminated how sex and gender-based healthcare disparities contribute to differences in health outcomes [e.g. ten year mortality for women after the WISE study]. In this talk, I’ll be focusing on how we systematically quantified time-to-diagnosis disparities across phenotypes, and how we analyzed the impact of these disparities on the diagnostic process. Our quantification of time-to-diagnosis disparities showed that, for patients that would go on to enter the same phenotype at CUMC, women are consistently diagnosed later than men for the majority of the same presenting symptoms. To analyze the impact of these disparities on the diagnostic process, we trained gender-agnostic classifiers for each disease using patients’ presenting symptoms. We assessed how the fairness gap changes with incrementally changed amounts of data. Despite our earlier finding that women present with symptoms earlier than men, the majority of these gender-agnostic classifiers paradoxically performed better for men than for women.
Speaker: Linying Zhang, MS, MA (PhD Student) – Dr. George Hripcsak’s Lab
Title: Adjusting for Unobserved Confounding Using Large-Scale Propensity Score
Brief Abstract: Even though nowadays observational data can contain an enormous number of covariates, the existence of unobserved confounder still cannot be excluded and remains a major barrier to drawing causal inference from observational data. Recently, analyses using large-scale propensity score (LSPS) adjustment have demonstrated examples of adjusting for unobserved confounding by including hundreds of thousands of available covariates. In this paper, we present the conditions under which LSPS can reduce bias due to unobserved confounder. In addition, we show that LSPS does not adjust for various unwanted variables (e.g., M-bias colliders, instruments). We demonstrate the performance of LSPS on bias reduction using both simulations and real medical data.
Speaker 1: Amanda J. Moy, MPH, MA (PhD student) – Dr. Sarah Collins Rossetti’s (OPTACIMM) Lab
Title: Measuring clinical documentation burden among physicians and nurses: a review of the literature
Abstract: Rapid adoption of electronic health records (EHRs) following the passage of the HITECH Act has led to advances in both individual- and population-level health. Largely still in its infancy, EHRs have also resulted in unintended consequences on clinical practice and healthcare systems, including significant increases in clinician documentation time. Extended work hours, time constraints, clerical workload, and disruptions to the patient-provider encounter, have led to a rise in discontent with existing documentation methods in EHR systems. This documentation burden (hereinafter referred to as “burden”) has been linked to increases in medical errors, threats to patient safety, inferior documentation quality, and ultimately, burnout among nurses and physicians. Few empirically-based readily-available solutions to reduce burden exist, and to our best knowledge, there is no consensus on the best approaches to measure burden. Furthermore, the concept of burden has been ill-defined and poorly operationalized. Achieving the three primary goals (cited in the 21st Century Cures Act) to reduce EHR-related clinician burdens that influence care will necessitate standardized, quantitative measurements to evaluate impact. The purpose of this scoping review is to assess the state of science, identify gaps in knowledge, and synthesize characteristics of burden measurement among physicians and nurses using EHRs.
Speaker 2: James Rogers, MS, MA, MPhil (PhD student) – Dr. Chunhua Weng’s Lab
Title: Comparison of trial participants and non-participants using electronic health record data
Abstract: Clinical trials are medical research studies in which participants are assigned to receive one or more interventions so that researchers can evaluate the interventions’ effects. They are quintessential for the development of medical evidence, but are susceptible to a variety of challenges. One such challenge is generalizability, which refers to the ability to apply the conclusions of a study to a different set of relevant patients outside the context of that study. Assessing generalizability of clinical trials is important because differences in underlying clinical characteristics can impact the estimated effect of the interventions, ultimately impacting their clinical meaningfulness. However, most contemporary assessments provide minimal granularity on clinical comparisons. In this presentation, I will explore an alternative approach that combines electronic health record (EHR) data with enrollment data from prior clinical trials, while also highlighting potential implications that emerge from the results of this study.
Title: Machine learning for mental healthcare: a human-centered approach
Abstract: Machine learning advances are opening new routes to more precise healthcare, from the discovery of disease subtypes for stratified interventions to the development of personalized interactions supporting self-care between clinic visits. This offers an exciting opportunity for machine learning techniques to impact healthcare in a meaningful way. Within the healthcare domain, machine learning for mental healthcare is an under-investigated area and yet a potentially highly impactful area of research. In this talk, I will present recent work on probabilistic graphical modeling to enable a more personalized approach to mental healthcare, whereby information can be aggregated from multiple sources within a unified modeling framework. We present a human-centered approach to mental healthcare which is aimed at increasing the effectiveness of psychological wellbeing practitioners.
Bio: Dr. Danielle Belgrave is a Principal Researcher Manager at Microsoft Research, in Cambridge (UK) in the Health Intelligence group where she leads Project Talia. She is particularly interested in integrating medical domain knowledge to develop probabilistic graphical models to develop personalized treatment strategies in health. Originally from Trinidad and Tobago, she received her BSc in Mathematics and Statistics from London School of Economics, an MSc in Statistics from University College London and her PhD in Machine Learning and Statistics for Healthcare from The University of Manchester where she was a Microsoft Research PhD scholar. Prior to joining Microsoft Research, she had a tenured faculty position at Imperial College London.
Australian Institute of Health Innovation
Effects of automation on risk identification and nurses’ decision making
Abstract: Electronic Decision Support Systems (DSS) can facilitate the five steps of the nursing care process (NCP): assessment, problem identification, planning, intervention, and evaluation. At each of these steps, nurses are required to process information and make complex decisions. DSS also present opportunities to support human information processing which can be broken down into four distinct functions – information acquisition, information analysis, decision selection and action implementation. For instance, to assess problem risks, nurses need to acquire information about patient’s history and physical health, analyze risk status, decide, and implement suitable management strategies. While current DSS have capacity to automate information analysis and decision selection, they require nurses to manually perform other tasks. In this project, we reviewed evidence on effects of automation in DSS on patient outcomes, care delivery and nurses’ decision making. Next, we interviewed nurses to explore their perceptions about existing DSS for risks assessments of falls and pressure injuries, which are among the top hospital acquired complications in Australia. Finally, we designed a simulated DSS that automates these risk assessments.
Due to the 2020 AMIA Conference, there was no seminar on Nov. 16.
Professor, Department of Medicine; Adjunct Professor, Departments of Bioengineering and Computer Science; Co-Director, Bioinformatics and Systems Biology PhD Program
University of California San Diego
Title: Interpreting the cancer genome through physical and functional models of the cancer cell
Abstract: Recently we and other laboratories have launched the Cancer Cell Map Initiative (ccmi.org) and have been building momentum. The goal of the CCMI is to produce a complete map of the gene and protein wiring diagram of a cancer cell. We and others believe this map, currently missing, will be a critical component of any future system to decode a patient’s cancer genome. I will describe efforts along several lines: 1. Coalition building. We have made notable progress in building a coalition of institutions to generate the data, as well as to develop the computational methodology required to build and use the maps. 2. Development of technology for mapping gene-gene interactions rapidly using the CRISPR system. 3. Causal network maps connecting DNA mutations (somatic and germline, coding and noncoding) to the cancer events they induce downstream. 4. Development of software and database technology to visualize and store cancer cell maps. 5. A machine learning system for integrating the above data to create multi-scale models of cancer cells. In a recent paper by Ma et al., we have shown how a hierarchical map of cell structure can be embedded with a deep neural network, so that the model is able to accurately simulate the effect of mutations in genotype on the cellular phenotype.
Dr. Ideker Bio: Dr. Ideker is a Professor in the Departments of Medicine, Bioengineering and Computer Science at UC San Diego. Additionally, he is the Director or Co-Director of the National Resource for Network Biology (NRNB), the Cancer Cell Map Initiative (CCMI), the Psychiatric Cell Map Initiative (PCMI), and the UCSD Bioinformatics PhD Program, and former Chief of Genetics in the Department of Medicine. He is a pioneer in using genome-scale measurements to construct network models of cellular processes and disease. The Ideker Laboratory seeks to create artificially intelligent models of cancer and other diseases for the translation of patient data to precision diagnosis and treatment.
Due to Election Day, there was no seminar on Nov. 2.
Prof. of Pharmaco– and Device Epidemiology, University of Oxford
Title: OHDSI-EHDEN Joint COVID-19 Collaboration: Global Real-World Data to Fight COVID-19
Due to Columbia’s involvement with the 2020 OHDSI Symposium, there will be no seminar Oct. 19.
DBMI Student Town Hall
Title: Real-world Informatics Challenges in Building a Real-World Oncology Registry: The Multiple Myeloma Research Foundation’s CureCloud Experience
Abstract: One of the biggest impediments to personalized medicine is having enough data about a given disease process to in order to explore that disease from multiple perspectives – such as genomics, EHR and immunologics. In 2017, the Mulitple Myeloma Research Foundation, building on the previous successes of its CoMMpass Clinical Trial, sought to build a registry with 5-times the number of participants than it had in CoMMpass. It took on a number of tenets that proved exceptionally challenging for this work including the desire to work directly with patients, return clinical genomic data to patients and their clinicians, and aggregate data from a large array of data sources. In July 2020, the CureCloud Direct-to-Patient Registry opened for patient recruitment. After just 2 months, the registry has over 250 registrants. The challenges of getting this registry opened for recruitment demonstrates the numerous challenges in working across the US with “all comers”, the vast array of EHR vendors, standing up a new CLIA-validated bioinformatics pipeline, and getting the data ultimately returned to patients. This talk will discuss the many real-world challenges and solutions put into place in standing up this program from an informatics, regulatory, legal and clinical perspective.
Title: Medical Expertise: Why and when is explanation needed?
Abstract: Since medical practice is a human endeavor, rapid technologic advances create a need to bridge disciplines to enable clinicians to benefit from them. In turn, this necessitates a broadening of disciplinary boundaries to consider cognitive and social factors related to the design and use of technology in the medical context. My awareness of these issues began when I started investigating the development of models of medical expertise and the symbolic representation of medical knowledge in the late1980s. The last 30 years of multidisciplinary research on medical cognition in my laboratory have shown the remarkable importance of cognitive factors that determine how health professionals comprehend information, solve problems, and make decisions. These investigations into the process of medical reasoning have made significant contributions to the design of clinical AI systems. These systems offer great potential for progress to improve people’s health and well-being, but their adoption in clinical practice is still limited. A lack of transparency in these systems is identified as one of the main barriers to their acceptance. My talk will elaborate on what we have learned about how medical practitioners acquire, understand, explain, and utilize expertise, focusing on cognitive-psychological methods and frameworks. It will also discuss how such work elucidates key lessons and challenges for the development of usable, useful, and safe decision-support systems to augment human intelligence in the clinical world.
2020 Spring Seminars
Dr. Melanie Wall
Title: Predicting service use and functioning for people with first episode psychosis in coordinated specialty care (due to technology error, this video isn’t available, though Dr. Wall’s presentation slides are available here)
Abstract: A key initiative in research focused on treatment for first episode psychosis (FEP) is improving the implementation of evidence-based coordinated specialty care (CSC). One area of improvement is expected to come from improved data analytics facilitated by linking different clinical sites through common data elements and a unified informatics approach for aggregating and analyzing patient level data. The present study examines to what extent predictive modeling of patient-level outcomes based on background variables collected at intake and throughout care can be used to differentiate individuals in a way that is useful. Using data from 600 FEP patients from 15 different CSC sites, we will develop and compare several machine learning models for predicting multivariate, correlated outcomes across one year of care. Presentation of results will focus on interpretability of differential prediction across sites and usefulness for facilitating service decisions.
Bio: Melanie Wall is Professor of Biostatistics and Director of Mental Health Data Science (MHDS) in the New York State Psychiatric Institute (NYSPI) and Columbia University psychiatry department. MHDS is made up of a team of 15 biostatisticians collaborating on predominately NIH (NIMH/NIH/NIAAA/NIDA) funded research projects related to psychiatry. She has worked extensively with modeling complex multilevel and multimodal data on a wide array of psychosocial public health and psychiatric research questions in both clinical studies and large epidemiologic studies (over 300 total journal publications). She is an expert in longitudinal data analysis and latent variable modeling, including structural equation modeling focused on mediating and moderating (interaction) effects where she has made many methodological contributions. She has a long track record as a biostatistical mentor for Ph.D. students and NIH K awardees and regularly teaches graduate level courses in the Department of Biostatistics in the Mailman School of Public Health attended by clinical Masters students, Ph.D. students, post-docs, and psychiatry fellows. Her current research mission is improving the accessibility and application of state-of-the-art and reproducible statistical methods across different areas psychiatric research.
Oliver Bear Don’t Walk
TITLE: Comparing the Impact of Transfer Learning Between Clinical Care Institutions on Clinical Note Classification Tasks
ABSTRACT: Performing transfer learning with neural networks such as BERT, ELMo and GPT has lead to state-of-the-art results in the clinical domain on many natural language processing applications. Performing transfer learning with these kinds of models often includes task agnostic pre-training and then fine-tuning on a specific downstream task. However, previous work has found that pre-training at one institution and fine-tuning on a downstream task at another can lead to decreased performance on the downstream task. Differences between clinical institutions (e.g. patient population, documentation practices, clinical specialties, provider roles) can affect clinical corpus qualities and lead to intra-domain variation between institutions. Intra-domain variation could be a contributing factor to downstream task performance degradation when performing transfer learning across institutions. To the best of our knowledge, we present the first experiments focused on performing transfer learning with BERT models between two institutions and compare performance differences on downstream tasks at each institution. We confirm the previous finding that BERT performs better on downstream tasks at institutions it was most recently pre-trained at, which holds true for both institutions in our experiments. We also found that consecutive pre-training on clinical corpora further improves downstream task performance if the most recent pre-training corpus and downstream task corpus are from the same institution. This performance increase is at the expense of decreased performance on the previous institution’s downstream task corpus, a phenomenon known as catastrophic forgetting.
TITLE: Deep Survival Analysis: Regularization and Missingness with Non Parametric Survival Distributions
ABSTRACT: Survival analysis methods have long been used to effectively model time-to-event data. In the healthcare setting, the Framingham risk score is a salient use case in which 10-year risk of cardiovascular disease is estimated using a narrow set of clinical features. In order to use a more expanded set of clinical features from the EHR for survival analysis, a number of challenges must be addressed: (1) there is a high degree of missingness in EHR data (2) there is no natural event to align all the data (3) many nonlinear relationships likely exist between clinical features. Deep survival analysis (DSA) is an approach for addressing these issues by leveraging a deep conditional model of failure time. However, questions about how different levels and kinds of missingness affect out-of-sample prediction remain largely unexplored. Furthermore, the best approach for regularizing a model with such high capacity is empirically untested. We leverage extensions to this model which relax the distributional assumptions to fit a non-parametric survival distribution. Using this model, we run experiments on different methods of regularization and explore the effects of censorship as well as different types of missingness on model robustness. Initial results show promise with DSA outperforming baseline methods such as Cox regression. In the future, we hope to explore alternative methods of non parametric modeling (e.g. normalizing flows), simulate more clinically realistic scenarios of missingness and apply the model to EHR data from Columbia and NYU.
Dr. Jun Kong
Title: Multi-Dimensional Histopathology Image Analysis for Cancer Research
Abstract: In biomedical research, the availability of an increasing array of high-throughput and high- resolution instruments has given rise to large datasets of imaging data. These datasets provide highly detailed views of tissue structures at the cellular level and present a strong potential to revolutionize biomedical translational research. However, traditional human-based tissue review is not feasible to obtain this wealth of imaging information due to the overwhelming data scale and unacceptable inter- and intra- observer variability. In this talk, I will first describe how to efficiently process Two-Dimension (2D) digital microscopy images for highly discriminating phenotypic information with development of microscopy image analysis algorithms and Computer-Aided Diagnosis (CAD) systems for processing and managing massive in-situ micro-anatomical imaging features with high performance computing. Additionally, I will present novel algorithms to support Three-Dimension (3D), molecular, and time- lapse microscopy image analysis with HPC. Specifically, I will demonstrate an on-demand registration method within a dynamic multi-resolution transformation mapping and an iterative transformation propagation framework. This will allow us to efficiently scrutinize volumes of interest on-demand in a single 3D space. For segmentation, I will present a scalable segmentation framework for histopathological structures with two steps: 1) initialization with joint information drawn from spatial connectivity, edge map, and shape analysis, and 2) variational level-set based contour deformation with data-driven sparse shape priors. For 3D reconstruction, I will present a novel cross section association method leveraging Integer Programming, Markov chain based posterior probability modelling and Bayesian Maximum A Posteriori (MAP) estimation for 3D vessel reconstruction. I will also present new methods for multi-stain image registration, biomarker detection, and 3D spatial density estimation for For molecular imaging data integration. For time-lapse microscopy images, I will present a new 3D cell segmentation method with gradient partitioning and local structure enhancement by eigenvalue analysis with hessian matrix. A derived tracking method will be also presented that combines Bayesian filters with a sequential Monte Carlo method with joint use of location, velocity, 3D morphology features, and intensity profile signatures. Our proposed methods featuring by 2D, 3D, molecular, and time-lapse microscopy image analysis will facilitate researchers and clinicians to extract accurate histopathology features, integrate spatially mapped pathophysiological biomarkers, and model disease progression dynamics at high cellular resolution. Therefore, they are essential for improving clinical decisions, enhancing prognostic predictions, inspiring new research hypotheses, and realizing personalized medicine.
Bio: Dr. Kong is Associated Professor in Department of Mathematics and Statistics, and Department of Computer Science in Georgia State University, adjunct faculty in Department of Biomedical Informatics, Department of Computer Science, and Winship Cancer Institute at Emory University. Dr. Kong’s research interests focus on big imaging data analytics for modeling cancer diseases, multi-modal biomedical image analysis, computer-aided diagnosis, machine learning, computational biology, and large-scale translational bioinformatics with heterogeneous data integration and mining. His long-term research goal is to establish an interdisciplinary research program engaged with mathematicians, biostatisticians, computer scientists, biologists, pathologists, and oncologists, among other domains of experts, for computational disease characterization, accurate modeling analysis, and granular-resolution understanding of diseases with large-scale, multi-modal, and multi-scale biomedical data.
Dr. Olga Troyanskaya
Professor of Computer Science and the Lewis-Sigler Institute for Integrative Genomics, Princeton University
Title: The quest for deep knowledge – decoding the human genome with deep learning models
Abstract: A key challenge in medicine and biology is to develop a complete understanding of the genomic architecture of disease. Yet the increasingly wide availability of ‘omics’ and clinical data, including whole genome sequencing, has far outpaced our ability to analyze these datasets. Challenges include interpreting the 98% of the genome that is noncoding to identify variants that are functional and may lead to disease, detangling genomic signals regulating tissue-specific gene expression, mapping the resulting genetic circuits and networks in disease-relevant tissues and cell types, and, finally, integrating the vast body of biological knowledge from model organisms with observations in humans. I will discuss methods that address these challenges, and highlight their applications to neurodevelopment and neurodegenerative diseases.
Title: Interventions to Increase Patient Portal Use in Vulnerable Populations: A Systematic Review
Abstract: Background: More than 100 studies document disparities in patient portal use among vulnerable populations. Developing and testing strategies to reduce disparities in use is essential to ensure portals benefit all populations.
Objective: To systematically review the impact of interventions designed to (1) increase portal use or predictors of use in vulnerable patient populations, or (2) reduce disparities in use.
Methods: A librarian searched Ovid MEDLINE, EMBASE, CINAHL, and Cochrane Reviews for studies published before September 1st, 2018. Two reviewers independently selected English-language research articles that evaluated any interventions designed to impact an eligible outcome. One reviewer extracted data and categorized interventions, and another assessed accuracy. Two reviewers independently assessed risk of bias.
Results: Out of 18 included studies, 15 (83%) assessed an intervention’s impact on portal use, 7 (39%) on predictors of use, and 1 (6%) on disparities in use. Most interventions studied focused on the individual (13 out of 26, 50%), as opposed to facilitating conditions, such as the tool, task, environment, or organization (SEIPS model). Twelve studies (67%) reported a statistically significant increase in portal use or predictors of use, or reduced disparities. Five studies (28%) had high or unclear risk of bias.
Conclusion: Individually-focused interventions have the most evidence for increasing portal use in vulnerable populations. Interventions affecting other system elements (tool, task, environment, organization) have not been sufficiently studied to draw conclusions. Given the well-established evidence for disparities in use and the limited research on effective interventions, research should move beyond identifying disparities to systematically addressing them at multiple levels.
Title: The Data Consult Service: an opportunity to bring new evidence to the bedside.
Abstract: Evidence-based medicine facilitates clinical care standardization, reduces medical care misuse and overuse and eventually leads to health care cost reduction and improvement in effectiveness and quality of care. On the other hand, current evidence has been reported to be inadequate or missing for specific clinical cases. Randomized clinical trials, which are the gold standard of clinical evidence, are often not generalizable to real-world patients and fail to include patients with multiple co-morbidities, patients who are pregnant, the elderly, and other vulnerable populations. On the other hand, a growing body of observational data, along with the continuing accumulation of practice-based evidence, has made new approaches to evidence generation available. We will present our first steps in developing a Data Consult Service – a clinical decision support tool that uses observational data to answer clinicians’ questions in real time. We will discuss our work on discovering potential areas of use and target groups for this tool as well as first answered questions and future work.
Fall 2019 Seminars
TITLE: Using Genetics to Address the Challenges of 21st Century Drug Development
BIO: Michael N. Cantor, MD, MA is Executive Director, Clinical Informatics, at the Regeneron Genetics Center. Currently his work focuses on developing and optimizing phenotypes from EHR and cohort data and linking them with genetic data to help discover new drug targets. Prior to Regeneron, he was Director of Clinical Research Informatics at New York University School of Medicine. As Director of Clinical Research Informatics, he was also the clinical director for NYULH’s DataCore, where his work focused on data management for clinical trials, using data from clinical systems to research, and advanced analytics. His research interests include integrating and standardizing social determinants of health-related data into the EHR, optimizing informatics tools for frontline clinicians, and providing self-service data access tools for researchers. During his previous tenure at NYU, Dr. Cantor was the Chief Medical Information Officer for the South Manhattan Healthcare Network of the New York City Health and Hospitals Corporation, based at Bellevue, and saw patients and precepted at the medical clinic there. Dr. Cantor completed his residency in internal medicine and informatics training at Columbia, has an M.D. from Emory University, and an A.B. from Princeton, and is an Associate Professor in the Department of Medicine at NYU School of Medicine. He currently sees patients weekly at Bellevue’s medicine clinic.
Speaker: Jonathan Elias, MD, Clinical Informatics Fellow
Title: A Day in the Life of a Clinical Informatics Fellow: CI Fellowship, Epic Together’s Mobile Messaging and Provider Team Project and the Epic Together Pre- & Post-Implementation Study
Abstract: Per AMIA, Clinical Informatics (CI) is the application of informatics and information technology to deliver healthcare services. The CI Fellowship is a two-year ACGME accredited fellowship now being offered to one candidate a year through NYP CUMC, after completion of a medical residency. During this seminar, the fellowship structure and goals with example projects and research will be discussed.
A large area of focus of the fellowship is operational CI projects and academic research. Currently, Columbia University Medical Center (CUMC), NewYork-Presbyterian (NYP) and Weill Cornell Medical Center (WCM) are preparing to implement an enterprise-wide clinical information system, the EpicCare© Electronic Health Record (EHR). With the implementation of the EpicCare© EHR, there is an opportunity to improve, streamline and standardize role delineation, clinical communications and patient assignment across the EHR and secure mobile messaging platforms. The goals and processes associated with this project will be discussed.
Finally, a brief overview & update of the Epic Pre- & Post-Implementation Study will be explored. The overall purpose of this study is to evaluate clinical workflows, process efficiencies, EHR utilization, data quality and overall perceived system usability post implementation of Epic at NYP/CUMC/WCM compared to systems in place prior to Epic implementation. This project is comprised of three specific aims, outlined below, with associated high-level approach and metrics. Aim 1: Conduct pre-post time motion study focused in inpatient setting and outpatient setting (including emergency department) to identify documentation workflow and time changes after Epic EHR implementation. Aim 2: Conduct log-file analyses to measure process efficiencies, EHR utilization (e.g., documentation time), and EHR data quality metrics. Aim 3: Administer a survey to measure and compare health professionals’ perceived usability and satisfaction pre- and post-Epic implementation in the context of functionality to enhance the delivery of continuity of care and adaptation to new health information technology (HIT).
Speaker: Jiayao Wang, PhD Student, Dr. Dennis Vitkup’s Lab
Title: Contribution of recessive genotypes and common variants to autism spectrum disorder
Abstract: Autism spectrum disorder (ASD) is a genetically heterogeneous condition, caused by a combination of rare de novo and inherited variants as well as common variants in at least several hundred genes. However, significantly larger sample sizes are needed to identify the complete set of genetic risk factors. Also, contribution from inherited variants needs to be further investigated. Here we present for SPARK (SPARKForAutism.org) of ~9K families with ASD, all consented online. Whole exome sequencing (WES) and genotyping data were generated for each family using DNA from saliva. With Exome sequencing data and a simple statistical framework, we show a week contribution from recessive genotypes, as well as several significant recessive genes leads to Autism such as EIF3F and RELN. With genotype array data, we performed GWAS with transmission disequilibrium test and calculated polygenic risk scores for SPATK families. We show that autism probands has a significant higher polygenic risk compared to their siblings and the risk was spread all over the genome rather only from significant loci. Contribution from recessive genotypes and common variants, together with rare inherited variants and de novo mutations from SPARK project will complete our understanding of genetics of Autism.
There was no seminar on Nov. 25.
No seminar due to the AMIA Symposium.
Title: Oops! I’m on the wrong patient: Evaluating System-Level Interventions for Preventing Wrong-Patient Electronic Orders
Bio: Dr. Adelman’s Patient Safety Research Program began with the development of the Wrong-Patient Retract-and-Reorder (RAR) Measure—a valid and reliable method of quantifying the frequency of wrong-patient orders placed in electronic ordering systems. The Wrong-Patient RAR measure was the first automated measure of medical errors and the first Health IT Safety Measure endorsed by the National Quality Forum. The RAR method identifies thousands of near-miss, wrong-patient errors per year in large health systems, enabling researchers to test interventions to prevent this type of error.
The Wrong-Patient RAR measure has been used to evaluate the effectiveness of patient safety interventions in several studies conducted in different electronic health record systems and clinical settings, including in the neonatal intensive care unit (NICU). The measure is the primary outcome measure for supported by the Agency for Healthcare Research and Quality (R21HS023704, R01HS024945) and the National Institute for Child Health and Human Development (R01HD094793). Additional research is underway to extend the RAR methodology to other types of errors, such as wrong-drug errors, and develop new health IT safety measures (R01HS024538).
Results of Dr. Adelman’s research led to national patient safety guidance, including a recommendation issued by the Office of the National Coordinator for Health Information Technology that healthcare organizations use the Wrong-Patient RAR measure to monitor the frequency of wrong-patient orders. Effective 2019, The Joint Commission will require that hospitals adopt a distinct newborn naming convention that incorporates the mother’s first name, based on studies by Adelman and colleagues.
Due to the Election Day holiday on Tuesday, there is no Seminar today.
This is a DBMI Student Town Hall.
Speaker: Alex Kitaygorodsky, PhD Student, Dr. Yufeng Shen’s Lab
Title: Identification of disease-causing genetic mutations based on machine learning and large genomic data sets
Abstract: More than 3% of young children are born with developmental disorders such as congenital heart disease (CHD), congenital diaphragmatic hernia (CDH), and autism spectrum disorder (ASD). Understanding the genetic causes of these conditions is critical to improve health care for these children and to push forward human developmental biology and neuroscience. Recently, high-throughput sequencing technologies have enabled generation of large-scale genomic data in genetic studies of these conditions. However, translating human data to knowledge is challenging due to an incomplete understanding of biology and a lack of sufficiently powerful analytical methods. My work aims to develop new computational methods based on powerful machine learning techniques to interpret genome sequencing data and identify disease-causing genetic variations. In this talk, I will focus specifically on the role of regulatory non-protein coding mutations in CHD, where we have found a substantial role of variants disrupting RNA binding protein (RBP) binding sites. RBPs oversee normal regulation of gene expression, at both the transcriptional and especially post-transcriptional stages, and so their disruption via mutation represents an important but under-studied noncoding action mechanism. To better understand the observed enrichment in these sites, we first modeled RNA binding protein processes with a robust convolutional neural network. Then, we designed a gradient boosting super-model to integrate predicted RBP binding scores with multimodal genomic data, allowing us to predict pathogenic RBP and gene regulation disruption caused by individual mutations. Finally, we applied our model back to Whole Genome Sequencing data of autism and CHD to find new disease risk genes and improve genetic diagnosis. In summary, we leveraged large genomic datasets with a sophisticated machine learning approach to better analyze sequencing data, advance genomic medicine, and aid our understanding of developmental disorder genetics.
Speaker: Sylvia Cho, PhD Candidate, Dr. Karthik Natarajan’s Lab
Title: Identifying data quality dimensions for wearable device data
Abstract: Patient-generated health data (PGHD) is one of the emerging biomedical data that is captured and recorded by patients outside clinical encounters. One of the major factors that facilitates the documentation of PGHD is the proliferated use of health tracking technologies. Among the different health tracking technologies, wearable device is unique in that individuals can continuously and objectively self-track their health in free-living conditions. As a byproduct of using wearable devices for self-tracking, the large volume of accumulated data and diverse data types have led to the interest of reusing these data for research purposes. However, there are concerns on the quality of device-generated data due to various reasons such as technical and human limitations. Therefore, assessing the quality of wearable data is essential before reusing the data for research. Data quality dimension is an important feature for data quality assessment as it provides guidance on what aspect of data quality should be assessed for the research task. While there are abundant studies on data quality dimensions for traditional clinical data such as the electronic health record data, there is a lack of understanding on the important data quality dimensions for wearable device data. In this study, we aim to identify the data quality dimensions considered to be important by researchers when analyzing wearable data, and to verify if an existing data quality framework can be applied to this type of data or if it needs to be modified. In this talk, I will discuss the methods we used to identify the dimensions and present preliminary results of the study.
Video: Watch the presentation here
Title: Applications of Data Science and Machine Learning in Radiology and Cardiology
Abstract: The overall goal of our group is to leverage data-driven approaches to help improve patient outcomes. This talk will demonstrate examples of how are working toward this goal by leveraging large clinical datasets, data science and machine learning. Specific examples include: 1) using 46,583 clinically-acquired 3D computed tomography images of the brain to develop and implement a deep learning model to efficiently reprioritize radiology worklists for quicker diagnosis of intracranial hemorrhage; 2) using deep learning to analyze 723,754 echocardiographic videos of the heart to accurately predict patient mortality; 3) analyzing 2 million 12-lead electrocardiographic tracings from the heart to predict clinically relevant future events and 4) optimizing evidence-based care delivery for a population of >10,000 patients with heart failure using machine learning.
Bio: Dr. Fornwalt attended the University of South Carolina as an undergraduate in mathematics and marine science. He then worked in a free medical clinic for a year before starting an MD/PhD program at Emory and Georgia Tech. After finishing his degrees in 2010, he completed an internship in pediatrics at Boston Children’s Hospital before becoming an Assistant Professor at the University of Kentucky.
After four years on faculty in Kentucky, Dr. Fornwalt moved to Geisinger where he completed his diagnostic radiology residency and founded Geisinger’s Department of Imaging Science and Innovation, which focuses on data-driven approaches to improving patient outcomes. Dr. Fornwalt is also a practicing thoraco-abdominal radiologist and an active member of Geisinger’s Heart Institute.
Video: Watch the presentation here
Title: Integrative Analysis of Multi-view Data for Dimension Reduction and Prediction
Abstract: Multi-view data are data collected on the same set of samples but from different views/sources. They become increasingly common in modern biomedical studies. In this talk, I’ll introduce some recent developments of the integrative analysis of multi-view data, and present a new multivariate predictive model with application to a longitudinal study of aging.
Background: Multi-view data are data collected on the same set of samples but from different views/sources. They become increasingly common in modern biomedical studies. In this talk, I’ll introduce some recent developments of the integrative analysis of multi-view data, and present a new multivariate predictive model with application to a longitudinal study of aging.
Bio: Dr. Gen Li is devoted to developing new statistical learning methods for analyzing high dimensional biomedical data. He focuses on analyzing complex data with heterogeneous types that are collected from multiple sources. His methodological research interests include dimension reduction, predictive modeling, association analysis, and functional data analysis. He is also interested in genetics and bioinformatics. He is a consortium member of the NIH Common Fund program Genotype-Tissue Expression (GTEx) project, and contributes to the development of statistical methods for expression quantitative trait loci analysis in multiple tissues. He also has research interests in scientific domains including melanoma, microbiome, and urology research.
Video: Watch the presentation here
Title: Machine Learning in Healthcare
Abstract: In March of 2016, the AlphaGo computer program beat world champion (and human) Lee Sedol at the board game Go. The program’s success reflected the significant progress that machine learning research has made in recent years. However, AlphaGo was just one example of what can be achieved with machine learning. This talk will provide an overview of some of the techniques that are being used in machine learning today, as well as some recent and ongoing work by Google’s research teams to advance the applications of machine learning, particularly its role in biomedical research. The talk will also discuss some of the unique challenges around applications in healthcare.
Bio: Ming Jack Po MD, PhD is a product manager in Google Health, leading a number of its machine learning research projects as well as health care product teams. Prior to joining Google, Jack spent a decade working in different capacities in areas related to medical devices and healthcare delivery. Jack is currently a trustee of the Austen Riggs Center, a board member of El Camino Health Systems, a member of the National Library of Medicine Lister Hill’s Board of Scientific Counselors and a member of the ONC’s Interoperability Standards Priorities Task Force. Jack received his MD and PhD from Columbia University, his bachelor’s degree in Biomedical Engineering, and Masters degree in Mathematics from Johns Hopkins University.
Speaker: Alexander Hsieh, PhD student
Title: Detection of mosaic single nucleotide variants in exome sequencing data and implications for congenital heart disease
Abstract: The contribution of somatic mosaicism, or genetic mutations arising after oocyte fertilization, to congenital heart disease (CHD) is not well understood. Further, the relationship between mosaicism in blood and cardiovascular tissue has not been determined. We developed a computational method, Expectation-Maximization-based detection of Mosaicism (EM-mosaic), to analyze mosaicism in exome sequences of 2530 CHD proband-parent trios. EM-mosaic detected 326 mosaic mutations in blood and/or cardiac tissue DNA. Of the 309 detected in blood DNA, 85/94 (90%) tested were independently confirmed. Twenty-five mosaic variants altered CHD-risk genes, affecting 1% of our cohort. Of these 25, 22/22 candidates tested were confirmed. Variants predicted as damaging had higher variant allele fraction than benign variants, suggesting a role in CHD. The frequency of mosaic variants above 10% mosaicism was 0.13/person in blood and 0.14/person in cardiac tissue. Analysis of 66 individuals with matched cardiac tissue available revealed both tissue-specific and shared mosaicism, with shared mosaics generally having higher allele fraction. We estimate that ~1% of CHD probands have a mosaic variant detectable in blood that could contribute to cardiac malformations, particularly those damaging variants expressed at higher allele fraction compared to benign variants. Although blood is a readily-available DNA source, cardiac tissues analyzed contributed ~5% of somatic mosaic variants identified, indicating the value of tissue mosaicism analyses.
Speaker: Michelle Chau, PhD student
Title: Developing a user-centered, machine learning approach to identify preferences for inspirational social media health-related images for young populations
Abstract: Nutrition interventions for adolescents and young adults (AYAs) increasingly rely on mobile platforms and social media. Most assume nutritional decisions are rational, targeting intentions such as goal setting and self-monitoring. However, in the absence of motivation and time, nutrition choices are often automatic and based on heuristics. The use of images is a simple way to deliver heuristic messaging. My preliminary research showing AYAs frequent use of social media for inspiration, further suggests health-related images may be suitable for nutrition interventions with these groups. Previous studies have explored inspirational social media content using qualitative and manual methods. However, there is an active area of research in computational visual analysis that explores preferences and prediction for image retrieval and recommendation tasks. The application of these techniques within health and specifically how to translate human preferences into the technical requirements needed to identify inspirational images for nutrition and young populations is underexplored. In this talk, I will discuss a study to identify image features that are relevant for inspiring healthy eating in health-related social media content. Further, I will discuss future directions for exploring how these features may be incorporated into machine learning models.