BINF G4003: Methods I: Symbolic Methods

Class Description: Survey of foundational symbolic methods for modeling health information systems and for making those models explicit and sharable.  The topics cover clinical terminologies (e.g., ICD-9, SNOMED-CT, MeSH, UMLS), biomedical ontologies (e.g., GO, Disease Ontology, PharmGKB), knowledge representation, computerized practice guidelines, semantic interoperability, and text processing. Prerequisites: Acculturation to Programming and Statistics (BINF G4000) or permission of instructor.


Chunhua Weng, PhD

Class Schedule

This class meets in the fall. There are two lectures and a lab each week.

Overview: This course is customized for 1st-year PhD and MA students in the biomedical informatics program and also open to other interested students at Columbia. It provides an overview of symbolic methods. Topics include:

  • Semantic Interoperability
  • Knowledge & Concept Representation
  • Terminologies & Ontologies
  • Natural Language Processing
  • Data / Information Models
  • Concept Mapping & Clinical Phenotyping

Lectures: Lectures are interactive, and an emphasis is put on class participation in discussions. Please note the schedule of lectures may be subject to change. Students will be informed of changes if/when they occur.

Lab sessions: Sessions are conducted using a group-based format; see the “lab group assignments” document for your group assignment for the entire semester. Please bring a laptop. Labs are due on the following Sunday at 11:59PM (midnight) by uploading them into CourseWorks. Students should work on the lab with their group members, but labs must be submitted individually. See the grading rubric below for details on grading.

Readings: Readings are posted on CourseWorks. Each week includes on average 30 pages (2 hours) of required reading. Please complete the required readings for each week in advance of the Tuesday morning lecture. Supplemental reading is included for ease-of-reference, and is not required.

Office hours: Instructor: by appointment only; TA: 11:30 – 12:30 on Fridays (after lab) online or by appointment 

Useful Resources

UMLS Terminology Services •
ICD9/ICD10 lookup •
SNOMED browser •
RxNav (for RxNorm) •
UMLS Resources •
Publicly available OHDSI data •
PheKB •
Ensembl • /
BioMart •
BioProject •
Galaxy •
FHIR Documentation •
Phenolyzer •

Lecture/Lab Topics

Course overview
KR and Terminology: significance, knowledge gaps, and challenges
Intro, setup accounts, critique/improve example terminology
Terminology in use: "Work" in EHR Phenotyping
Why clinical terminology is hard
Phenotype/concept normalization: Defining phenotypes in ICD-10
Terminology Example: ICD-9 and ICD-10
Terminology Example: SNOMED-CT and Uses
Phenotype/concept normalization: Using SNOMED-CT with ICD-10
Using RxNorm and LOINC for drugs and measurements
Terminology Example: Human Phenotype Ontology (HPO)
Terminology Example: HPO & EHR-Phenolyzer
Semester project I
Terminology Methods: Vision and Desiderata
Terminology Methods: Desiderata, MED
Semester project II
Terminology Metathesaurus: UMLS
Terminology Metathesaurus: UMLS
Project midpoint presentations (10 + 5 minutes)
Terminology Methods: Concept Mapping via Usagi + MetaMap + BioAnnotator
Semantics Knowledge Representation for Medical Reasoning
Semester project IV
Terminology Metathesaurus: OMOP CDM
Semester project V
Terminology Metathesaurus: OMOP CDM
Terminology Methods: Reference vs. Interface Terminology
Semester project VI
Terminology Methods: Terminology Auditing: Methods and Issues
Semester project VII
Semantic Representation for Medical Reasoning
Midterm review
Terminology Methods: Terminology and Language, Knowledge Representation
Terminology Methods: Biases in informatics, Challenges and Opportunities
Terminology Methods: NLP for information encoding, Named Entity Recognition, Concept Normalization, Coordination Ellipsis
Final presentations part 1
Final presentations part 2

Week 1: Introduction


  • AMIA Definition of Biomedical Informatics
  • HIMSS Definition of Interoperability
  • Arden Syntax & Medical Logic Modules (Excerpt)
  • Dixon BE, Vreeman DJ, Grannis SJ. The long road to semantic interoperability in support of public health: Experiences from two states. J Biomed Inform. 2014;49:3-8.
  • Weinberger D. The Problem with the Data-Information-Knowledge-Wisdom Hierarchy. Harvard Business Review. 2010


  • Agusti, A. 2013. ‘Phenotypes and disease characterization in chronic obstructive pulmonary disease. Toward the extinction of phenotypes?’, Ann Am Thorac Soc, 10 Suppl: S125-30.
  • Dolin, R. H., and L. Alschuler. 2011. ‘Approaching semantic interoperability in Health Level Seven’, J Am Med Inform Assoc, 18: 99-103.
  • Kulikowski CA, Shortliffe EH, Currie LM, et al. Amia board white paper: Definition of biomedical informatics and specification of core competencies for graduate education in the discipline. J Am Med Inform Assoc. 2012;19(6):931-8.
  • Mitchell, J. A., U. Gerdin, D. A. Lindberg, C. Lovis, F. J. Martin-Sanchez, R. A. Miller, E. H. Shortliffe, and T. Y. Leong. 2011. ’50 years of informatics research on decision support: what’s next’, Methods Inf Med, 50: 525-35.
  • Winnenburg, R , and O 2014. ‘Coverage of phenotypes in standard terminologies. ‘, Joint Bio-Ontologies and BioLINK ISMB’2014 SIG session “Phenotype Day” 2014:41-44.

Week 2: EHR phenotyping and ICD-9


  • J Biomed Inform. 2019 Nov;99:103293. doi: 10.1016/j.jbi.2019.103293.  Epub 2019 Sep 19.  Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network
  • WHO | International Classification of Diseases (ICD) Information Sheet. WHO. Available at:
  • Israel RA. The International Classification of Disease. Two hundred years of development. Public Health Rep 1978; 93: 150–152.
  • Steindel SJ. International classification of diseases, 10th edition, clinical modification and procedure coding system: descriptive overview of the next generation HIPAA code sets. J Am Med Inform Assoc JAMIA 2010; 17: 274–282. doi:10.1136/jamia.2009.001230.


  • Sarrazin MSV, Rosenthal GE. Finding Pure and Simple Truths With Administrative Data. JAMA 2012; 307: 1433–1435. doi:10.1001/jama.2012.404.
  • Vaidya SR, Shapiro JS, Lovett PB, Kuperman GJ. Acute coronary syndrome cohort definition: troponin versus ICD-9-CM codes. Future Cardiol 2010; 6: 725–731. doi:10.2217/fca.10.81.
  • Waikar SS, Wald R, Chertow GM, et al. Validity of International Classification of Diseases, Ninth Revision, Clinical Modification Codes for Acute Renal Failure. J Am Soc Nephrol 2006; 17: 1688–1694. doi:10.1681/ASN.2006010073.
  • Benesch C, Witter DM, Wilder AL, Duncan PW, Samsa GP, Matchar DB. Inaccuracy of the International Classification of Diseases (ICD-9-CM) in identifying the diagnosis of ischemic cerebrovascular disease. Neurology 1997; 49: 660–664.

Week 3: ICD-10 and SNOMED-CT


  • Course: Welcome to SNOMED CT E-Learning, Topic: Starter Tutorials. Available at: (please go through all modules of the course)
  • Dhombres F, Winnenburg R, Case JT, Bodenreider O. Extending the coverage of phenotypes in SNOMED CT through post-coordination. Stud Health Technol Inform 2015; 216: 795–799.


  • Lee D, Cornet R, Lau F, de Keizer N. A survey of SNOMED CT implementations. J Biomed Inform 2013; 46: 87–96. doi:10.1016/j.jbi.2012.09.006.
  • Melton GB, Parsons S, Morrison FP, Rothschild AS, Markatou M, Hripcsak G. Inter-patient distance metrics using SNOMED CT defining relationships. J Biomed Inform 2006; 39: 697–705. doi:10.1016/j.jbi.2006.01.004.

Week 4: SNOMED-CT and RxNorm


RxNorm Overview

Week 5: HPO and the EHR Phenolyzer


  • Köhler S, Vasilevsky NA, Engelstad M, et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res 2017; 45: D865–D876. doi:10.1093/nar/gkw1039.

Week 6: The Desiderata


  • Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med 1998; 37: 394–403.
  • Rector AL. Clinical terminology: why is it so hard? Methods Inf Med 1999; 38: 239–252.
  • Chute CG. Medical Concept Representation. In: Medical Informatics. Integrated Series in Information Systems. Springer, Boston, MA, 2005; 163–182. doi:10.1007/0-387-25739-X_6.
  • Balkanyi L, Schulz S, Cornet R, Bodenreider O. Medical Concept Representation: The Years Beyond 2000. Stud Health Technol Inform 2013; 192: 1011.


  • Cimino JJ. In defense of the Desiderata. J Biomed Inform 2006; 39: 299–306. doi:10.1016/j.jbi.2005.11.008.

Week 7: UMLS


  • UMLS Basics Tutorial (Excerpts) 


  • Achour SL, Dojat M, Rieux C, Bierling P, Lepage E. A umls-based knowledge acquisition tool for rule-based clinical decision support system development. J Am Med Inform Assoc. 2001;8(4):351-60.
  • Bodenreider, O. 2004. ‘The Unified Medical Language System (UMLS): integrating biomedical terminology’, Nucleic Acids Res, 32: D267-70.
  • Humphreys, B. L., et al. 1998. ‘The Unified Medical Language System: an informatics research collaboration’, J Am Med Inform Assoc, 5: 1-11.
  • McCray, A. T., A. M. Razi, et al. 1996. ‘The UMLS Knowledge Source Server: a versatile Internet-based research tool’, Proc AMIA Annu Fall Symp: 164-8.

Week 8: Concept mapping with Usagi, BioAnnotator, MetaMap


  • Aronson AR, Lang FM. An overview of MetaMap: Historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229-36.

Recommended readings:

  1. Niu et al- Multi-task Character-Level Attentional Networks for Medical Concept Normalization
  2. Kang et al. – Using rule-based natural language processing to improve disease normalization in biomedical text.
  3. Tsuruoka et al – Normalizing biomedical terms by minimizing ambiguity and variability
  4. Leaman et al – DNorm: disease name normalization with pairwise learning to rank
  5. Pradhan et al – Evaluating the state of the art in disorder recognition and normalization of the clinical narrative

 Week 9: OMOP CDM


  • CommonDataModel: Definition and DDLs for the OMOP Common Data Model (CDM). Observational Health Data Sciences and Informatics, 2018. Available at:
  • Stang PE, Ryan PB, Racoosin JA, et al. Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann Intern Med 2010; 153: 600–606. doi:10.7326/0003-4819-153-9-201011020-00010.
  • Ceusters W, Blaisure J. A Realism-Based View on Counts in OMOP’s Common Data Model. Stud Health Technol Inform 2017; 237: 55–62.


  • Hripcsak G, Duke JD, Shah NH, et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Stud Health Technol Inform 2015; 216: 574–578.
  • Graham DJ, Reichman ME, Wernecke M, et al. Cardiovascular, bleeding, and mortality risks in elderly Medicare patients treated with dabigatran or warfarin for nonvalvular atrial fibrillation. Circulation 2015; 131: 157–164. doi:10.1161/CIRCULATIONAHA.114.012061.

Week 10: Terminology types and mapping



  • Excerpt from Hendrix GG. Natural-language interface. Computational Linguistics. 1982;8(2).
  • Davis R, Shrobe H, Szolovits P. What is a knowledge representation? AI magazine. 1993 Mar 15;14(1):17.
  • Hendrix GG. Natural-language interface. Computational Linguistics. 1982;8(2).
  • Levesque HJ, Brachman RJ. Expressiveness and tractability in knowledge representation and reasoning 1. Computational intelligence. 1987 Feb;3(1):78.
  • Newell A. The knowledge level. Artificial intelligence. 1982 Jan 18;18(1):87-127.
  • Roberts K, Demner-Fushman D. Toward a natural language interface for EHR questions. AMIA Summits on Translational Science Proceedings. 2015;2015:157

Week 11: Terminology auditing

Week 12: Semantic representation

Week 13: NLP


  • Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp 2000: 270–274.


  • Friedman C. Towards a comprehensive medical language processing system: methods and issues. Proc Conf Am Med Inform Assoc AMIA Fall Symp 1997: 595–599.

Week 14: NLP and terminology evaluation