June 15 Talk: Making new discoveries through open science requires cleaning up our metadata mess

DBMI is excited to welcome Mark Musen, Professor of Biomedical Informatics Research at Stanford University, who will present “Making new discoveries through open science requires cleaning up our metadata mess” in PH20-200 on Thursday, June 15, from 12-1 pm. If you are interested in joining, please RSVP here.  

ABSTRACT: Scientists dream of using AI to sift through existing data sets to make new discoveries. They imagine teenagers with laptops poring through online data and stumbling on the next new drug to treat COVID. Unfortunately, despite increasing governmental requirements for data sharing and open science, real-world scientific data are almost never in a form that enables third parties to make sense of what the original investigators have done. The problem is that the metadata that scientists create to describe their data sets are, in most cases, terrible. Although data scientists take it for granted that their approaches will advance scientific knowledge simply through the availability of the myriad data sets in the public domain, that vision will not pan out until scientific metadata become more useful. That means using reporting guidelines to structure and standardize metadata attributes, and using ontologies to standardize the terms used in metadata descriptions. There are great opportunities for workers in informatics to help investigators to clean up legacy metadata to make existing datasets more understandable, and also to help investigators to create good metadata in the first place. The FAIRware Workbench converts existing metadata to a form that is more searchable and interpretable. The CEDAR Workbench helps scientists to use formal reporting guidelines and terms from standard ontologies to create high-quality metadata de novo. These tools demonstrate how computational approaches can be helpful to ensure that scientific data are interpretable by both humans and machines.

BIO: Dr. Musen is the Stanford Medicine Professor of Biomedical Informatics Research at Stanford University.  He conducts research related to intelligent systems, reusable ontologies, open science, and biomedical decision support. His long-standing work on the Protégé system has led to widely used, open-source technology to build ontologies and intelligent computer systems. More recently, he has focussed on the application of semantic technology to problems in open science. He is principal investigator of the BioPortal repository of scientific ontologies, as well as the Center for Expanded Data Annotation and Retrieval (CEDAR), which applies semantic technology to enhance the metadata used to annotate scientific datasets. Dr. Musen is an elected fellow of the American College of Medical Informatics and a member of the National Academy of Medicine. He is a recipient of the Donald A. B. Lindberg Award for Innovation in Informatics from the American Medical Informatics Association. He was founding co-editor-in-chief of the journal Applied Ontology.