The DBMI seminar series is a 1-credit course for DBMI students who can benefit from hearing new methods of research from speakers from both academia and industry. It is currently being offered virtually, though it is traditionally held in PH-200.
Selected presentations are recorded and posted to the DBMI YouTube page, as well as within their individual sections. Seminars for the fall 2020 semester, with log-in information for external speakers, are posted. Seminars from previous semesters are listed below.
2020 Upcoming Fall Seminars
Previous 2020 Fall Seminars
Title: Machine learning for mental healthcare: a human-centered approach
Abstract: Machine learning advances are opening new routes to more precise healthcare, from the discovery of disease subtypes for stratified interventions to the development of personalized interactions supporting self-care between clinic visits. This offers an exciting opportunity for machine learning techniques to impact healthcare in a meaningful way. Within the healthcare domain, machine learning for mental healthcare is an under-investigated area and yet a potentially highly impactful area of research. In this talk, I will present recent work on probabilistic graphical modeling to enable a more personalized approach to mental healthcare, whereby information can be aggregated from multiple sources within a unified modeling framework. We present a human-centered approach to mental healthcare which is aimed at increasing the effectiveness of psychological wellbeing practitioners.
Bio: Dr. Danielle Belgrave is a Principal Researcher Manager at Microsoft Research, in Cambridge (UK) in the Health Intelligence group where she leads Project Talia. She is particularly interested in integrating medical domain knowledge to develop probabilistic graphical models to develop personalized treatment strategies in health. Originally from Trinidad and Tobago, she received her BSc in Mathematics and Statistics from London School of Economics, an MSc in Statistics from University College London and her PhD in Machine Learning and Statistics for Healthcare from The University of Manchester where she was a Microsoft Research PhD scholar. Prior to joining Microsoft Research, she had a tenured faculty position at Imperial College London.
Australian Institute of Health Innovation
Effects of automation on risk identification and nurses’ decision making
Abstract: Electronic Decision Support Systems (DSS) can facilitate the five steps of the nursing care process (NCP): assessment, problem identification, planning, intervention, and evaluation. At each of these steps, nurses are required to process information and make complex decisions. DSS also present opportunities to support human information processing which can be broken down into four distinct functions – information acquisition, information analysis, decision selection and action implementation. For instance, to assess problem risks, nurses need to acquire information about patient’s history and physical health, analyze risk status, decide, and implement suitable management strategies. While current DSS have capacity to automate information analysis and decision selection, they require nurses to manually perform other tasks. In this project, we reviewed evidence on effects of automation in DSS on patient outcomes, care delivery and nurses’ decision making. Next, we interviewed nurses to explore their perceptions about existing DSS for risks assessments of falls and pressure injuries, which are among the top hospital acquired complications in Australia. Finally, we designed a simulated DSS that automates these risk assessments.
Due to the 2020 AMIA Conference, there was no seminar on Nov. 16.
Professor, Department of Medicine; Adjunct Professor, Departments of Bioengineering and Computer Science; Co-Director, Bioinformatics and Systems Biology PhD Program
University of California San Diego
Title: Interpreting the cancer genome through physical and functional models of the cancer cell
Abstract: Recently we and other laboratories have launched the Cancer Cell Map Initiative (ccmi.org) and have been building momentum. The goal of the CCMI is to produce a complete map of the gene and protein wiring diagram of a cancer cell. We and others believe this map, currently missing, will be a critical component of any future system to decode a patient’s cancer genome. I will describe efforts along several lines: 1. Coalition building. We have made notable progress in building a coalition of institutions to generate the data, as well as to develop the computational methodology required to build and use the maps. 2. Development of technology for mapping gene-gene interactions rapidly using the CRISPR system. 3. Causal network maps connecting DNA mutations (somatic and germline, coding and noncoding) to the cancer events they induce downstream. 4. Development of software and database technology to visualize and store cancer cell maps. 5. A machine learning system for integrating the above data to create multi-scale models of cancer cells. In a recent paper by Ma et al., we have shown how a hierarchical map of cell structure can be embedded with a deep neural network, so that the model is able to accurately simulate the effect of mutations in genotype on the cellular phenotype.
Dr. Ideker Bio: Dr. Ideker is a Professor in the Departments of Medicine, Bioengineering and Computer Science at UC San Diego. Additionally, he is the Director or Co-Director of the National Resource for Network Biology (NRNB), the Cancer Cell Map Initiative (CCMI), the Psychiatric Cell Map Initiative (PCMI), and the UCSD Bioinformatics PhD Program, and former Chief of Genetics in the Department of Medicine. He is a pioneer in using genome-scale measurements to construct network models of cellular processes and disease. The Ideker Laboratory seeks to create artificially intelligent models of cancer and other diseases for the translation of patient data to precision diagnosis and treatment.
Due to Election Day, there was no seminar on Nov. 2.
Prof. of Pharmaco– and Device Epidemiology, University of Oxford
Title: OHDSI-EHDEN Joint COVID-19 Collaboration: Global Real-World Data to Fight COVID-19
Due to Columbia’s involvement with the 2020 OHDSI Symposium, there will be no seminar Oct. 19.
DBMI Student Town Hall
Title: Real-world Informatics Challenges in Building a Real-World Oncology Registry: The Multiple Myeloma Research Foundation’s CureCloud Experience
Abstract: One of the biggest impediments to personalized medicine is having enough data about a given disease process to in order to explore that disease from multiple perspectives – such as genomics, EHR and immunologics. In 2017, the Mulitple Myeloma Research Foundation, building on the previous successes of its CoMMpass Clinical Trial, sought to build a registry with 5-times the number of participants than it had in CoMMpass. It took on a number of tenets that proved exceptionally challenging for this work including the desire to work directly with patients, return clinical genomic data to patients and their clinicians, and aggregate data from a large array of data sources. In July 2020, the CureCloud Direct-to-Patient Registry opened for patient recruitment. After just 2 months, the registry has over 250 registrants. The challenges of getting this registry opened for recruitment demonstrates the numerous challenges in working across the US with “all comers”, the vast array of EHR vendors, standing up a new CLIA-validated bioinformatics pipeline, and getting the data ultimately returned to patients. This talk will discuss the many real-world challenges and solutions put into place in standing up this program from an informatics, regulatory, legal and clinical perspective.
Title: Medical Expertise: Why and when is explanation needed?
Abstract: Since medical practice is a human endeavor, rapid technologic advances create a need to bridge disciplines to enable clinicians to benefit from them. In turn, this necessitates a broadening of disciplinary boundaries to consider cognitive and social factors related to the design and use of technology in the medical context. My awareness of these issues began when I started investigating the development of models of medical expertise and the symbolic representation of medical knowledge in the late1980s. The last 30 years of multidisciplinary research on medical cognition in my laboratory have shown the remarkable importance of cognitive factors that determine how health professionals comprehend information, solve problems, and make decisions. These investigations into the process of medical reasoning have made significant contributions to the design of clinical AI systems. These systems offer great potential for progress to improve people’s health and well-being, but their adoption in clinical practice is still limited. A lack of transparency in these systems is identified as one of the main barriers to their acceptance. My talk will elaborate on what we have learned about how medical practitioners acquire, understand, explain, and utilize expertise, focusing on cognitive-psychological methods and frameworks. It will also discuss how such work elucidates key lessons and challenges for the development of usable, useful, and safe decision-support systems to augment human intelligence in the clinical world.
2020 Spring Seminars
Dr. Melanie Wall
Title: Predicting service use and functioning for people with first episode psychosis in coordinated specialty care (due to technology error, this video isn’t available, though Dr. Wall’s presentation slides are available here)
Abstract: A key initiative in research focused on treatment for first episode psychosis (FEP) is improving the implementation of evidence-based coordinated specialty care (CSC). One area of improvement is expected to come from improved data analytics facilitated by linking different clinical sites through common data elements and a unified informatics approach for aggregating and analyzing patient level data. The present study examines to what extent predictive modeling of patient-level outcomes based on background variables collected at intake and throughout care can be used to differentiate individuals in a way that is useful. Using data from 600 FEP patients from 15 different CSC sites, we will develop and compare several machine learning models for predicting multivariate, correlated outcomes across one year of care. Presentation of results will focus on interpretability of differential prediction across sites and usefulness for facilitating service decisions.
Bio: Melanie Wall is Professor of Biostatistics and Director of Mental Health Data Science (MHDS) in the New York State Psychiatric Institute (NYSPI) and Columbia University psychiatry department. MHDS is made up of a team of 15 biostatisticians collaborating on predominately NIH (NIMH/NIH/NIAAA/NIDA) funded research projects related to psychiatry. She has worked extensively with modeling complex multilevel and multimodal data on a wide array of psychosocial public health and psychiatric research questions in both clinical studies and large epidemiologic studies (over 300 total journal publications). She is an expert in longitudinal data analysis and latent variable modeling, including structural equation modeling focused on mediating and moderating (interaction) effects where she has made many methodological contributions. She has a long track record as a biostatistical mentor for Ph.D. students and NIH K awardees and regularly teaches graduate level courses in the Department of Biostatistics in the Mailman School of Public Health attended by clinical Masters students, Ph.D. students, post-docs, and psychiatry fellows. Her current research mission is improving the accessibility and application of state-of-the-art and reproducible statistical methods across different areas psychiatric research.
Oliver Bear Don’t Walk
TITLE: Comparing the Impact of Transfer Learning Between Clinical Care Institutions on Clinical Note Classification Tasks
ABSTRACT: Performing transfer learning with neural networks such as BERT, ELMo and GPT has lead to state-of-the-art results in the clinical domain on many natural language processing applications. Performing transfer learning with these kinds of models often includes task agnostic pre-training and then fine-tuning on a specific downstream task. However, previous work has found that pre-training at one institution and fine-tuning on a downstream task at another can lead to decreased performance on the downstream task. Differences between clinical institutions (e.g. patient population, documentation practices, clinical specialties, provider roles) can affect clinical corpus qualities and lead to intra-domain variation between institutions. Intra-domain variation could be a contributing factor to downstream task performance degradation when performing transfer learning across institutions. To the best of our knowledge, we present the first experiments focused on performing transfer learning with BERT models between two institutions and compare performance differences on downstream tasks at each institution. We confirm the previous finding that BERT performs better on downstream tasks at institutions it was most recently pre-trained at, which holds true for both institutions in our experiments. We also found that consecutive pre-training on clinical corpora further improves downstream task performance if the most recent pre-training corpus and downstream task corpus are from the same institution. This performance increase is at the expense of decreased performance on the previous institution’s downstream task corpus, a phenomenon known as catastrophic forgetting.
TITLE: Deep Survival Analysis: Regularization and Missingness with Non Parametric Survival Distributions
ABSTRACT: Survival analysis methods have long been used to effectively model time-to-event data. In the healthcare setting, the Framingham risk score is a salient use case in which 10-year risk of cardiovascular disease is estimated using a narrow set of clinical features. In order to use a more expanded set of clinical features from the EHR for survival analysis, a number of challenges must be addressed: (1) there is a high degree of missingness in EHR data (2) there is no natural event to align all the data (3) many nonlinear relationships likely exist between clinical features. Deep survival analysis (DSA) is an approach for addressing these issues by leveraging a deep conditional model of failure time. However, questions about how different levels and kinds of missingness affect out-of-sample prediction remain largely unexplored. Furthermore, the best approach for regularizing a model with such high capacity is empirically untested. We leverage extensions to this model which relax the distributional assumptions to fit a non-parametric survival distribution. Using this model, we run experiments on different methods of regularization and explore the effects of censorship as well as different types of missingness on model robustness. Initial results show promise with DSA outperforming baseline methods such as Cox regression. In the future, we hope to explore alternative methods of non parametric modeling (e.g. normalizing flows), simulate more clinically realistic scenarios of missingness and apply the model to EHR data from Columbia and NYU.
Dr. Jun Kong
Title: Multi-Dimensional Histopathology Image Analysis for Cancer Research
Abstract: In biomedical research, the availability of an increasing array of high-throughput and high- resolution instruments has given rise to large datasets of imaging data. These datasets provide highly detailed views of tissue structures at the cellular level and present a strong potential to revolutionize biomedical translational research. However, traditional human-based tissue review is not feasible to obtain this wealth of imaging information due to the overwhelming data scale and unacceptable inter- and intra- observer variability. In this talk, I will first describe how to efficiently process Two-Dimension (2D) digital microscopy images for highly discriminating phenotypic information with development of microscopy image analysis algorithms and Computer-Aided Diagnosis (CAD) systems for processing and managing massive in-situ micro-anatomical imaging features with high performance computing. Additionally, I will present novel algorithms to support Three-Dimension (3D), molecular, and time- lapse microscopy image analysis with HPC. Specifically, I will demonstrate an on-demand registration method within a dynamic multi-resolution transformation mapping and an iterative transformation propagation framework. This will allow us to efficiently scrutinize volumes of interest on-demand in a single 3D space. For segmentation, I will present a scalable segmentation framework for histopathological structures with two steps: 1) initialization with joint information drawn from spatial connectivity, edge map, and shape analysis, and 2) variational level-set based contour deformation with data-driven sparse shape priors. For 3D reconstruction, I will present a novel cross section association method leveraging Integer Programming, Markov chain based posterior probability modelling and Bayesian Maximum A Posteriori (MAP) estimation for 3D vessel reconstruction. I will also present new methods for multi-stain image registration, biomarker detection, and 3D spatial density estimation for For molecular imaging data integration. For time-lapse microscopy images, I will present a new 3D cell segmentation method with gradient partitioning and local structure enhancement by eigenvalue analysis with hessian matrix. A derived tracking method will be also presented that combines Bayesian filters with a sequential Monte Carlo method with joint use of location, velocity, 3D morphology features, and intensity profile signatures. Our proposed methods featuring by 2D, 3D, molecular, and time-lapse microscopy image analysis will facilitate researchers and clinicians to extract accurate histopathology features, integrate spatially mapped pathophysiological biomarkers, and model disease progression dynamics at high cellular resolution. Therefore, they are essential for improving clinical decisions, enhancing prognostic predictions, inspiring new research hypotheses, and realizing personalized medicine.
Bio: Dr. Kong is Associated Professor in Department of Mathematics and Statistics, and Department of Computer Science in Georgia State University, adjunct faculty in Department of Biomedical Informatics, Department of Computer Science, and Winship Cancer Institute at Emory University. Dr. Kong’s research interests focus on big imaging data analytics for modeling cancer diseases, multi-modal biomedical image analysis, computer-aided diagnosis, machine learning, computational biology, and large-scale translational bioinformatics with heterogeneous data integration and mining. His long-term research goal is to establish an interdisciplinary research program engaged with mathematicians, biostatisticians, computer scientists, biologists, pathologists, and oncologists, among other domains of experts, for computational disease characterization, accurate modeling analysis, and granular-resolution understanding of diseases with large-scale, multi-modal, and multi-scale biomedical data.
Dr. Olga Troyanskaya
Professor of Computer Science and the Lewis-Sigler Institute for Integrative Genomics, Princeton University
Title: The quest for deep knowledge – decoding the human genome with deep learning models
Abstract: A key challenge in medicine and biology is to develop a complete understanding of the genomic architecture of disease. Yet the increasingly wide availability of ‘omics’ and clinical data, including whole genome sequencing, has far outpaced our ability to analyze these datasets. Challenges include interpreting the 98% of the genome that is noncoding to identify variants that are functional and may lead to disease, detangling genomic signals regulating tissue-specific gene expression, mapping the resulting genetic circuits and networks in disease-relevant tissues and cell types, and, finally, integrating the vast body of biological knowledge from model organisms with observations in humans. I will discuss methods that address these challenges, and highlight their applications to neurodevelopment and neurodegenerative diseases.
Title: Interventions to Increase Patient Portal Use in Vulnerable Populations: A Systematic Review
Abstract: Background: More than 100 studies document disparities in patient portal use among vulnerable populations. Developing and testing strategies to reduce disparities in use is essential to ensure portals benefit all populations.
Objective: To systematically review the impact of interventions designed to (1) increase portal use or predictors of use in vulnerable patient populations, or (2) reduce disparities in use.
Methods: A librarian searched Ovid MEDLINE, EMBASE, CINAHL, and Cochrane Reviews for studies published before September 1st, 2018. Two reviewers independently selected English-language research articles that evaluated any interventions designed to impact an eligible outcome. One reviewer extracted data and categorized interventions, and another assessed accuracy. Two reviewers independently assessed risk of bias.
Results: Out of 18 included studies, 15 (83%) assessed an intervention’s impact on portal use, 7 (39%) on predictors of use, and 1 (6%) on disparities in use. Most interventions studied focused on the individual (13 out of 26, 50%), as opposed to facilitating conditions, such as the tool, task, environment, or organization (SEIPS model). Twelve studies (67%) reported a statistically significant increase in portal use or predictors of use, or reduced disparities. Five studies (28%) had high or unclear risk of bias.
Conclusion: Individually-focused interventions have the most evidence for increasing portal use in vulnerable populations. Interventions affecting other system elements (tool, task, environment, organization) have not been sufficiently studied to draw conclusions. Given the well-established evidence for disparities in use and the limited research on effective interventions, research should move beyond identifying disparities to systematically addressing them at multiple levels.
Title: The Data Consult Service: an opportunity to bring new evidence to the bedside.
Abstract: Evidence-based medicine facilitates clinical care standardization, reduces medical care misuse and overuse and eventually leads to health care cost reduction and improvement in effectiveness and quality of care. On the other hand, current evidence has been reported to be inadequate or missing for specific clinical cases. Randomized clinical trials, which are the gold standard of clinical evidence, are often not generalizable to real-world patients and fail to include patients with multiple co-morbidities, patients who are pregnant, the elderly, and other vulnerable populations. On the other hand, a growing body of observational data, along with the continuing accumulation of practice-based evidence, has made new approaches to evidence generation available. We will present our first steps in developing a Data Consult Service – a clinical decision support tool that uses observational data to answer clinicians’ questions in real time. We will discuss our work on discovering potential areas of use and target groups for this tool as well as first answered questions and future work.
Fall 2019 Seminars
TITLE: Using Genetics to Address the Challenges of 21st Century Drug Development
BIO: Michael N. Cantor, MD, MA is Executive Director, Clinical Informatics, at the Regeneron Genetics Center. Currently his work focuses on developing and optimizing phenotypes from EHR and cohort data and linking them with genetic data to help discover new drug targets. Prior to Regeneron, he was Director of Clinical Research Informatics at New York University School of Medicine. As Director of Clinical Research Informatics, he was also the clinical director for NYULH’s DataCore, where his work focused on data management for clinical trials, using data from clinical systems to research, and advanced analytics. His research interests include integrating and standardizing social determinants of health-related data into the EHR, optimizing informatics tools for frontline clinicians, and providing self-service data access tools for researchers. During his previous tenure at NYU, Dr. Cantor was the Chief Medical Information Officer for the South Manhattan Healthcare Network of the New York City Health and Hospitals Corporation, based at Bellevue, and saw patients and precepted at the medical clinic there. Dr. Cantor completed his residency in internal medicine and informatics training at Columbia, has an M.D. from Emory University, and an A.B. from Princeton, and is an Associate Professor in the Department of Medicine at NYU School of Medicine. He currently sees patients weekly at Bellevue’s medicine clinic.
Speaker: Jonathan Elias, MD, Clinical Informatics Fellow
Title: A Day in the Life of a Clinical Informatics Fellow: CI Fellowship, Epic Together’s Mobile Messaging and Provider Team Project and the Epic Together Pre- & Post-Implementation Study
Abstract: Per AMIA, Clinical Informatics (CI) is the application of informatics and information technology to deliver healthcare services. The CI Fellowship is a two-year ACGME accredited fellowship now being offered to one candidate a year through NYP CUMC, after completion of a medical residency. During this seminar, the fellowship structure and goals with example projects and research will be discussed.
A large area of focus of the fellowship is operational CI projects and academic research. Currently, Columbia University Medical Center (CUMC), NewYork-Presbyterian (NYP) and Weill Cornell Medical Center (WCM) are preparing to implement an enterprise-wide clinical information system, the EpicCare© Electronic Health Record (EHR). With the implementation of the EpicCare© EHR, there is an opportunity to improve, streamline and standardize role delineation, clinical communications and patient assignment across the EHR and secure mobile messaging platforms. The goals and processes associated with this project will be discussed.
Finally, a brief overview & update of the Epic Pre- & Post-Implementation Study will be explored. The overall purpose of this study is to evaluate clinical workflows, process efficiencies, EHR utilization, data quality and overall perceived system usability post implementation of Epic at NYP/CUMC/WCM compared to systems in place prior to Epic implementation. This project is comprised of three specific aims, outlined below, with associated high-level approach and metrics. Aim 1: Conduct pre-post time motion study focused in inpatient setting and outpatient setting (including emergency department) to identify documentation workflow and time changes after Epic EHR implementation. Aim 2: Conduct log-file analyses to measure process efficiencies, EHR utilization (e.g., documentation time), and EHR data quality metrics. Aim 3: Administer a survey to measure and compare health professionals’ perceived usability and satisfaction pre- and post-Epic implementation in the context of functionality to enhance the delivery of continuity of care and adaptation to new health information technology (HIT).
Speaker: Jiayao Wang, PhD Student, Dr. Dennis Vitkup’s Lab
Title: Contribution of recessive genotypes and common variants to autism spectrum disorder
Abstract: Autism spectrum disorder (ASD) is a genetically heterogeneous condition, caused by a combination of rare de novo and inherited variants as well as common variants in at least several hundred genes. However, significantly larger sample sizes are needed to identify the complete set of genetic risk factors. Also, contribution from inherited variants needs to be further investigated. Here we present for SPARK (SPARKForAutism.org) of ~9K families with ASD, all consented online. Whole exome sequencing (WES) and genotyping data were generated for each family using DNA from saliva. With Exome sequencing data and a simple statistical framework, we show a week contribution from recessive genotypes, as well as several significant recessive genes leads to Autism such as EIF3F and RELN. With genotype array data, we performed GWAS with transmission disequilibrium test and calculated polygenic risk scores for SPATK families. We show that autism probands has a significant higher polygenic risk compared to their siblings and the risk was spread all over the genome rather only from significant loci. Contribution from recessive genotypes and common variants, together with rare inherited variants and de novo mutations from SPARK project will complete our understanding of genetics of Autism.
There was no seminar on Nov. 25.
No seminar due to the AMIA Symposium.
Title: Oops! I’m on the wrong patient: Evaluating System-Level Interventions for Preventing Wrong-Patient Electronic Orders
Bio: Dr. Adelman’s Patient Safety Research Program began with the development of the Wrong-Patient Retract-and-Reorder (RAR) Measure—a valid and reliable method of quantifying the frequency of wrong-patient orders placed in electronic ordering systems. The Wrong-Patient RAR measure was the first automated measure of medical errors and the first Health IT Safety Measure endorsed by the National Quality Forum. The RAR method identifies thousands of near-miss, wrong-patient errors per year in large health systems, enabling researchers to test interventions to prevent this type of error.
The Wrong-Patient RAR measure has been used to evaluate the effectiveness of patient safety interventions in several studies conducted in different electronic health record systems and clinical settings, including in the neonatal intensive care unit (NICU). The measure is the primary outcome measure for supported by the Agency for Healthcare Research and Quality (R21HS023704, R01HS024945) and the National Institute for Child Health and Human Development (R01HD094793). Additional research is underway to extend the RAR methodology to other types of errors, such as wrong-drug errors, and develop new health IT safety measures (R01HS024538).
Results of Dr. Adelman’s research led to national patient safety guidance, including a recommendation issued by the Office of the National Coordinator for Health Information Technology that healthcare organizations use the Wrong-Patient RAR measure to monitor the frequency of wrong-patient orders. Effective 2019, The Joint Commission will require that hospitals adopt a distinct newborn naming convention that incorporates the mother’s first name, based on studies by Adelman and colleagues.
Due to the Election Day holiday on Tuesday, there is no Seminar today.
This is a DBMI Student Town Hall.
Speaker: Alex Kitaygorodsky, PhD Student, Dr. Yufeng Shen’s Lab
Title: Identification of disease-causing genetic mutations based on machine learning and large genomic data sets
Abstract: More than 3% of young children are born with developmental disorders such as congenital heart disease (CHD), congenital diaphragmatic hernia (CDH), and autism spectrum disorder (ASD). Understanding the genetic causes of these conditions is critical to improve health care for these children and to push forward human developmental biology and neuroscience. Recently, high-throughput sequencing technologies have enabled generation of large-scale genomic data in genetic studies of these conditions. However, translating human data to knowledge is challenging due to an incomplete understanding of biology and a lack of sufficiently powerful analytical methods. My work aims to develop new computational methods based on powerful machine learning techniques to interpret genome sequencing data and identify disease-causing genetic variations. In this talk, I will focus specifically on the role of regulatory non-protein coding mutations in CHD, where we have found a substantial role of variants disrupting RNA binding protein (RBP) binding sites. RBPs oversee normal regulation of gene expression, at both the transcriptional and especially post-transcriptional stages, and so their disruption via mutation represents an important but under-studied noncoding action mechanism. To better understand the observed enrichment in these sites, we first modeled RNA binding protein processes with a robust convolutional neural network. Then, we designed a gradient boosting super-model to integrate predicted RBP binding scores with multimodal genomic data, allowing us to predict pathogenic RBP and gene regulation disruption caused by individual mutations. Finally, we applied our model back to Whole Genome Sequencing data of autism and CHD to find new disease risk genes and improve genetic diagnosis. In summary, we leveraged large genomic datasets with a sophisticated machine learning approach to better analyze sequencing data, advance genomic medicine, and aid our understanding of developmental disorder genetics.
Speaker: Sylvia Cho, PhD Candidate, Dr. Karthik Natarajan’s Lab
Title: Identifying data quality dimensions for wearable device data
Abstract: Patient-generated health data (PGHD) is one of the emerging biomedical data that is captured and recorded by patients outside clinical encounters. One of the major factors that facilitates the documentation of PGHD is the proliferated use of health tracking technologies. Among the different health tracking technologies, wearable device is unique in that individuals can continuously and objectively self-track their health in free-living conditions. As a byproduct of using wearable devices for self-tracking, the large volume of accumulated data and diverse data types have led to the interest of reusing these data for research purposes. However, there are concerns on the quality of device-generated data due to various reasons such as technical and human limitations. Therefore, assessing the quality of wearable data is essential before reusing the data for research. Data quality dimension is an important feature for data quality assessment as it provides guidance on what aspect of data quality should be assessed for the research task. While there are abundant studies on data quality dimensions for traditional clinical data such as the electronic health record data, there is a lack of understanding on the important data quality dimensions for wearable device data. In this study, we aim to identify the data quality dimensions considered to be important by researchers when analyzing wearable data, and to verify if an existing data quality framework can be applied to this type of data or if it needs to be modified. In this talk, I will discuss the methods we used to identify the dimensions and present preliminary results of the study.
Video: Watch the presentation here
Title: Applications of Data Science and Machine Learning in Radiology and Cardiology
Abstract: The overall goal of our group is to leverage data-driven approaches to help improve patient outcomes. This talk will demonstrate examples of how are working toward this goal by leveraging large clinical datasets, data science and machine learning. Specific examples include: 1) using 46,583 clinically-acquired 3D computed tomography images of the brain to develop and implement a deep learning model to efficiently reprioritize radiology worklists for quicker diagnosis of intracranial hemorrhage; 2) using deep learning to analyze 723,754 echocardiographic videos of the heart to accurately predict patient mortality; 3) analyzing 2 million 12-lead electrocardiographic tracings from the heart to predict clinically relevant future events and 4) optimizing evidence-based care delivery for a population of >10,000 patients with heart failure using machine learning.
Bio: Dr. Fornwalt attended the University of South Carolina as an undergraduate in mathematics and marine science. He then worked in a free medical clinic for a year before starting an MD/PhD program at Emory and Georgia Tech. After finishing his degrees in 2010, he completed an internship in pediatrics at Boston Children’s Hospital before becoming an Assistant Professor at the University of Kentucky.
After four years on faculty in Kentucky, Dr. Fornwalt moved to Geisinger where he completed his diagnostic radiology residency and founded Geisinger’s Department of Imaging Science and Innovation, which focuses on data-driven approaches to improving patient outcomes. Dr. Fornwalt is also a practicing thoraco-abdominal radiologist and an active member of Geisinger’s Heart Institute.
Video: Watch the presentation here
Title: Integrative Analysis of Multi-view Data for Dimension Reduction and Prediction
Abstract: Multi-view data are data collected on the same set of samples but from different views/sources. They become increasingly common in modern biomedical studies. In this talk, I’ll introduce some recent developments of the integrative analysis of multi-view data, and present a new multivariate predictive model with application to a longitudinal study of aging.
Background: Multi-view data are data collected on the same set of samples but from different views/sources. They become increasingly common in modern biomedical studies. In this talk, I’ll introduce some recent developments of the integrative analysis of multi-view data, and present a new multivariate predictive model with application to a longitudinal study of aging.
Bio: Dr. Gen Li is devoted to developing new statistical learning methods for analyzing high dimensional biomedical data. He focuses on analyzing complex data with heterogeneous types that are collected from multiple sources. His methodological research interests include dimension reduction, predictive modeling, association analysis, and functional data analysis. He is also interested in genetics and bioinformatics. He is a consortium member of the NIH Common Fund program Genotype-Tissue Expression (GTEx) project, and contributes to the development of statistical methods for expression quantitative trait loci analysis in multiple tissues. He also has research interests in scientific domains including melanoma, microbiome, and urology research.
Video: Watch the presentation here
Title: Machine Learning in Healthcare
Abstract: In March of 2016, the AlphaGo computer program beat world champion (and human) Lee Sedol at the board game Go. The program’s success reflected the significant progress that machine learning research has made in recent years. However, AlphaGo was just one example of what can be achieved with machine learning. This talk will provide an overview of some of the techniques that are being used in machine learning today, as well as some recent and ongoing work by Google’s research teams to advance the applications of machine learning, particularly its role in biomedical research. The talk will also discuss some of the unique challenges around applications in healthcare.
Bio: Ming Jack Po MD, PhD is a product manager in Google Health, leading a number of its machine learning research projects as well as health care product teams. Prior to joining Google, Jack spent a decade working in different capacities in areas related to medical devices and healthcare delivery. Jack is currently a trustee of the Austen Riggs Center, a board member of El Camino Health Systems, a member of the National Library of Medicine Lister Hill’s Board of Scientific Counselors and a member of the ONC’s Interoperability Standards Priorities Task Force. Jack received his MD and PhD from Columbia University, his bachelor’s degree in Biomedical Engineering, and Masters degree in Mathematics from Johns Hopkins University.
Speaker: Alexander Hsieh, PhD student
Title: Detection of mosaic single nucleotide variants in exome sequencing data and implications for congenital heart disease
Abstract: The contribution of somatic mosaicism, or genetic mutations arising after oocyte fertilization, to congenital heart disease (CHD) is not well understood. Further, the relationship between mosaicism in blood and cardiovascular tissue has not been determined. We developed a computational method, Expectation-Maximization-based detection of Mosaicism (EM-mosaic), to analyze mosaicism in exome sequences of 2530 CHD proband-parent trios. EM-mosaic detected 326 mosaic mutations in blood and/or cardiac tissue DNA. Of the 309 detected in blood DNA, 85/94 (90%) tested were independently confirmed. Twenty-five mosaic variants altered CHD-risk genes, affecting 1% of our cohort. Of these 25, 22/22 candidates tested were confirmed. Variants predicted as damaging had higher variant allele fraction than benign variants, suggesting a role in CHD. The frequency of mosaic variants above 10% mosaicism was 0.13/person in blood and 0.14/person in cardiac tissue. Analysis of 66 individuals with matched cardiac tissue available revealed both tissue-specific and shared mosaicism, with shared mosaics generally having higher allele fraction. We estimate that ~1% of CHD probands have a mosaic variant detectable in blood that could contribute to cardiac malformations, particularly those damaging variants expressed at higher allele fraction compared to benign variants. Although blood is a readily-available DNA source, cardiac tissues analyzed contributed ~5% of somatic mosaic variants identified, indicating the value of tissue mosaicism analyses.
Speaker: Michelle Chau, PhD student
Title: Developing a user-centered, machine learning approach to identify preferences for inspirational social media health-related images for young populations
Abstract: Nutrition interventions for adolescents and young adults (AYAs) increasingly rely on mobile platforms and social media. Most assume nutritional decisions are rational, targeting intentions such as goal setting and self-monitoring. However, in the absence of motivation and time, nutrition choices are often automatic and based on heuristics. The use of images is a simple way to deliver heuristic messaging. My preliminary research showing AYAs frequent use of social media for inspiration, further suggests health-related images may be suitable for nutrition interventions with these groups. Previous studies have explored inspirational social media content using qualitative and manual methods. However, there is an active area of research in computational visual analysis that explores preferences and prediction for image retrieval and recommendation tasks. The application of these techniques within health and specifically how to translate human preferences into the technical requirements needed to identify inspirational images for nutrition and young populations is underexplored. In this talk, I will discuss a study to identify image features that are relevant for inspiring healthy eating in health-related social media content. Further, I will discuss future directions for exploring how these features may be incorporated into machine learning models.