BINF G4003: Methods I: Symbolic Methods
Class Description: Survey of foundational symbolic methods for modeling health information systems and for making those models explicit and sharable. The topics cover clinical terminologies (e.g., ICD-9, SNOMED-CT, MeSH, UMLS), biomedical ontologies (e.g., GO, Disease Ontology, PharmGKB), knowledge representation, computerized practice guidelines, semantic interoperability, and text processing. Prerequisites: Acculturation to Programming and Statistics (BINF G4000) or permission of instructor.
Instructor
 
															Chunhua Weng, PhD
Class Schedule
This class meets in the fall. There are two lectures and a lab each week.
Overview: This course is customized for 1st-year PhD and MA students in the biomedical informatics program and also open to other interested students at Columbia. It provides an overview of symbolic methods. Topics include:
- Semantic Interoperability
- Knowledge & Concept Representation
- Terminologies & Ontologies
- Natural Language Processing
- Data / Information Models
- Concept Mapping & Clinical Phenotyping
Lectures: Lectures are interactive, and an emphasis is put on class participation in discussions. Please note the schedule of lectures may be subject to change. Students will be informed of changes if/when they occur.
Lab sessions: Sessions are conducted using a group-based format; see the “lab group assignments” document for your group assignment for the entire semester. Please bring a laptop. Labs are due on the following Sunday at 11:59PM (midnight) by uploading them into CourseWorks. Students should work on the lab with their group members, but labs must be submitted individually. See the grading rubric below for details on grading.
Readings: Readings are posted on CourseWorks. Each week includes on average 30 pages (2 hours) of required reading. Please complete the required readings for each week in advance of the Tuesday morning lecture. Supplemental reading is included for ease-of-reference, and is not required.
Office hours: Instructor: by appointment only; TA: 11:30 – 12:30 on Fridays (after lab) online or by appointment
Useful Resources
UMLS Terminology Services • https://uts.nlm.nih.gov/home.html
ICD9/ICD10 lookup •  https://www.icd10data.com/ICD10CM/Codes
SNOMED browser • http://browser.ihtsdotools.org
RxNav (for RxNorm) • https://rxnav.nlm.nih.gov/
UMLS Resources • https://www.nlm.nih.gov/research/umls/
Publicly available OHDSI data • http://www.ohdsi.org/web/atlas/#/home
HPO • https://hpo.jax.org/app/
PheKB • https://phekb.org/phenotypes
Ensembl • https://academic.oup.com/bioinformatics/article/31/1/143/2366240 / https://www.ncbi.nlm.nih.gov/pubmed/24316576
BLAST • https://blast.ncbi.nlm.nih.gov/Blast.cgi
BioMart •  http://www.biomart.org/
BioProject • https://www.ncbi.nlm.nih.gov/bioproject
Galaxy • https://usegalaxy.org/
FHIR Documentation • https://www.hl7.org/fhir/documentation.html
Phenolyzer • http://phenolyzer.wglab.org/
Lecture/Lab Topics
| Topic | 
|---|
| Course overview | 
| KR and Terminology: significance, knowledge gaps, and challenges | 
| Intro, setup accounts, critique/improve example terminology | 
| Terminology in use: "Work" in EHR Phenotyping | 
| Why clinical terminology is hard | 
| Phenotype/concept normalization: Defining phenotypes in ICD-10 | 
| Terminology Example: ICD-9 and ICD-10 | 
| Terminology Example: SNOMED-CT and Uses | 
| Phenotype/concept normalization: Using SNOMED-CT with ICD-10 | 
| RxNorm | 
| FHIR | 
| Using RxNorm and LOINC for drugs and measurements | 
| Terminology Example: Human Phenotype Ontology (HPO) | 
| Terminology Example: HPO & EHR-Phenolyzer | 
| Semester project I | 
| Terminology Methods: Vision and Desiderata | 
| Terminology Methods: Desiderata, MED | 
| Semester project II | 
| Terminology Metathesaurus: UMLS | 
| Terminology Metathesaurus: UMLS | 
| Project midpoint presentations (10 + 5 minutes) | 
| Terminology Methods: Concept Mapping via Usagi + MetaMap + BioAnnotator | 
| Semantics Knowledge Representation for Medical Reasoning | 
| Semester project IV | 
| Terminology Metathesaurus: OMOP CDM | 
| Semester project V | 
| Terminology Metathesaurus: OMOP CDM | 
| Terminology Methods: Reference vs. Interface Terminology | 
| Semester project VI | 
| Terminology Methods: Terminology Auditing: Methods and Issues | 
| Semester project VII | 
| Semantic Representation for Medical Reasoning | 
| Midterm review | 
| Terminology Methods: Terminology and Language, Knowledge Representation | 
| Terminology Methods: Biases in informatics, Challenges and Opportunities | 
| Terminology Methods: NLP for information encoding, Named Entity Recognition, Concept Normalization, Coordination Ellipsis | 
| Final presentations part 1 | 
| Final presentations part 2 | 
Week 1: Introduction
Required:
- AMIA Definition of Biomedical Informatics
- HIMSS Definition of Interoperability
- Arden Syntax & Medical Logic Modules (Excerpt)
- Dixon BE, Vreeman DJ, Grannis SJ. The long road to semantic interoperability in support of public health: Experiences from two states. J Biomed Inform. 2014;49:3-8.
- Weinberger D. The Problem with the Data-Information-Knowledge-Wisdom Hierarchy. Harvard Business Review. 2010
Supplemental:
- Agusti, A. 2013. ‘Phenotypes and disease characterization in chronic obstructive pulmonary disease. Toward the extinction of phenotypes?’, Ann Am Thorac Soc, 10 Suppl: S125-30.
- Dolin, R. H., and L. Alschuler. 2011. ‘Approaching semantic interoperability in Health Level Seven’, J Am Med Inform Assoc, 18: 99-103.
- Kulikowski CA, Shortliffe EH, Currie LM, et al. Amia board white paper: Definition of biomedical informatics and specification of core competencies for graduate education in the discipline. J Am Med Inform Assoc. 2012;19(6):931-8.
- Mitchell, J. A., U. Gerdin, D. A. Lindberg, C. Lovis, F. J. Martin-Sanchez, R. A. Miller, E. H. Shortliffe, and T. Y. Leong. 2011. ’50 years of informatics research on decision support: what’s next’, Methods Inf Med, 50: 525-35.
- Winnenburg, R , and O 2014. ‘Coverage of phenotypes in standard terminologies. ‘, Joint Bio-Ontologies and BioLINK ISMB’2014 SIG session “Phenotype Day” 2014:41-44.
Week 2: EHR phenotyping and ICD-9
Required:
- J Biomed Inform. 2019 Nov;99:103293. doi: 10.1016/j.jbi.2019.103293. Epub 2019 Sep 19. Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network
- WHO | International Classification of Diseases (ICD) Information Sheet. WHO. Available at: http://www.who.int/classifications/icd/factsheet/en/
- Israel RA. The International Classification of Disease. Two hundred years of development. Public Health Rep 1978; 93: 150–152.
- Steindel SJ. International classification of diseases, 10th edition, clinical modification and procedure coding system: descriptive overview of the next generation HIPAA code sets. J Am Med Inform Assoc JAMIA 2010; 17: 274–282. doi:10.1136/jamia.2009.001230.
Supplemental:
- Sarrazin MSV, Rosenthal GE. Finding Pure and Simple Truths With Administrative Data. JAMA 2012; 307: 1433–1435. doi:10.1001/jama.2012.404.
- Vaidya SR, Shapiro JS, Lovett PB, Kuperman GJ. Acute coronary syndrome cohort definition: troponin versus ICD-9-CM codes. Future Cardiol 2010; 6: 725–731. doi:10.2217/fca.10.81.
- Waikar SS, Wald R, Chertow GM, et al. Validity of International Classification of Diseases, Ninth Revision, Clinical Modification Codes for Acute Renal Failure. J Am Soc Nephrol 2006; 17: 1688–1694. doi:10.1681/ASN.2006010073.
- Benesch C, Witter DM, Wilder AL, Duncan PW, Samsa GP, Matchar DB. Inaccuracy of the International Classification of Diseases (ICD-9-CM) in identifying the diagnosis of ischemic cerebrovascular disease. Neurology 1997; 49: 660–664.
Week 3: ICD-10 and SNOMED-CT
Required:
- Course: Welcome to SNOMED CT E-Learning, Topic: Starter Tutorials. Available at: https://elearning.ihtsdotools.org/course/view.php?id=5§ion=1 (please go through all modules of the course)
- Dhombres F, Winnenburg R, Case JT, Bodenreider O. Extending the coverage of phenotypes in SNOMED CT through post-coordination. Stud Health Technol Inform 2015; 216: 795–799.
Supplemental:
- Lee D, Cornet R, Lau F, de Keizer N. A survey of SNOMED CT implementations. J Biomed Inform 2013; 46: 87–96. doi:10.1016/j.jbi.2012.09.006.
- Melton GB, Parsons S, Morrison FP, Rothschild AS, Markatou M, Hripcsak G. Inter-patient distance metrics using SNOMED CT defining relationships. J Biomed Inform 2006; 39: 697–705. doi:10.1016/j.jbi.2006.01.004.
Week 4: SNOMED-CT and RxNorm
Required:
RxNorm Overview https://www.nlm.nih.gov/research/umls/rxnorm/overview.html
Week 5: HPO and the EHR Phenolyzer
Required:
- Köhler S, Vasilevsky NA, Engelstad M, et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res 2017; 45: D865–D876. doi:10.1093/nar/gkw1039.
Week 6: The Desiderata
Required:
- Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med 1998; 37: 394–403.
- Rector AL. Clinical terminology: why is it so hard? Methods Inf Med 1999; 38: 239–252.
- Chute CG. Medical Concept Representation. In: Medical Informatics. Integrated Series in Information Systems. Springer, Boston, MA, 2005; 163–182. doi:10.1007/0-387-25739-X_6.
- Balkanyi L, Schulz S, Cornet R, Bodenreider O. Medical Concept Representation: The Years Beyond 2000. Stud Health Technol Inform 2013; 192: 1011.
Supplemental:
- Cimino JJ. In defense of the Desiderata. J Biomed Inform 2006; 39: 299–306. doi:10.1016/j.jbi.2005.11.008.
Week 7: UMLS
Required:
- UMLS Basics Tutorial (Excerpts)
Supplemental:
- Achour SL, Dojat M, Rieux C, Bierling P, Lepage E. A umls-based knowledge acquisition tool for rule-based clinical decision support system development. J Am Med Inform Assoc. 2001;8(4):351-60.
- Bodenreider, O. 2004. ‘The Unified Medical Language System (UMLS): integrating biomedical terminology’, Nucleic Acids Res, 32: D267-70.
- Humphreys, B. L., et al. 1998. ‘The Unified Medical Language System: an informatics research collaboration’, J Am Med Inform Assoc, 5: 1-11.
- McCray, A. T., A. M. Razi, et al. 1996. ‘The UMLS Knowledge Source Server: a versatile Internet-based research tool’, Proc AMIA Annu Fall Symp: 164-8.
Week 8: Concept mapping with Usagi, BioAnnotator, MetaMap
Required:
- Aronson AR, Lang FM. An overview of MetaMap: Historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229-36.
Recommended readings:
- Niu et al- Multi-task Character-Level Attentional Networks for Medical Concept Normalization
- Kang et al. – Using rule-based natural language processing to improve disease normalization in biomedical text.
- Tsuruoka et al – Normalizing biomedical terms by minimizing ambiguity and variability
- Leaman et al – DNorm: disease name normalization with pairwise learning to rank
- Pradhan et al – Evaluating the state of the art in disorder recognition and normalization of the clinical narrative
Week 9: OMOP CDM
Required:
- CommonDataModel: Definition and DDLs for the OMOP Common Data Model (CDM). Observational Health Data Sciences and Informatics, 2018. Available at: https://github.com/OHDSI/CommonDataModel
- Stang PE, Ryan PB, Racoosin JA, et al. Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann Intern Med 2010; 153: 600–606. doi:10.7326/0003-4819-153-9-201011020-00010.
- Ceusters W, Blaisure J. A Realism-Based View on Counts in OMOP’s Common Data Model. Stud Health Technol Inform 2017; 237: 55–62.
Supplemental:
- Hripcsak G, Duke JD, Shah NH, et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Stud Health Technol Inform 2015; 216: 574–578.
- Graham DJ, Reichman ME, Wernecke M, et al. Cardiovascular, bleeding, and mortality risks in elderly Medicare patients treated with dabigatran or warfarin for nonvalvular atrial fibrillation. Circulation 2015; 131: 157–164. doi:10.1161/CIRCULATIONAHA.114.012061.
Week 10: Terminology types and mapping
Required:
- Excerpt from Davis R, Shrobe H, Szolovits P. What is a knowledge representation?. AI magazine. 1993 Mar 15;14(1):17.
- Ontology Development 101: Building your first ontology https://protege.stanford.edu/publications/ontology_development/ontology101.pdf
Supplemental:
- Excerpt from Hendrix GG. Natural-language interface. Computational Linguistics. 1982;8(2).
- Davis R, Shrobe H, Szolovits P. What is a knowledge representation? AI magazine. 1993 Mar 15;14(1):17.
- Hendrix GG. Natural-language interface. Computational Linguistics. 1982;8(2).
- Levesque HJ, Brachman RJ. Expressiveness and tractability in knowledge representation and reasoning 1. Computational intelligence. 1987 Feb;3(1):78.
- Newell A. The knowledge level. Artificial intelligence. 1982 Jan 18;18(1):87-127.
- Roberts K, Demner-Fushman D. Toward a natural language interface for EHR questions. AMIA Summits on Translational Science Proceedings. 2015;2015:157
Week 11: Terminology auditing
Week 12: Semantic representation
Week 13: NLP
Required:
- Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp 2000: 270–274.
Supplemental:
- Friedman C. Towards a comprehensive medical language processing system: methods and issues. Proc Conf Am Med Inform Assoc AMIA Fall Symp 1997: 595–599.
Week 14: NLP and terminology evaluation
