Carol Friedman, Ph.D.

My primary research interests are natural language text processing, clinical knowledge representation, and database management systems.

My work in natural language involves the processing of narrative patient reports in order to make clinical data accessible to other automated procedures, such as decision-support, quality assurance, patient management, automated ICD9, SNOMED, and UMLS encoding, vocabulary development tools, clinical research, data mining, and error detection. Although different types of clinical data are presently available online in textual form, data cannot be reliably retrieved from text. In order for data to be accessed appropriately, it must be in a structured form consisting of well-defined controlled vocabulary terms. The function of processing is to perform the extraction, structuring, and encoding of the underlying clinical information in the reports. The system we have developed is called MedLEE, and it has been integrated into the New York Presbyterian Hospital (NYPH) Clinical Information System. It was initially applied to radiological examinations of the chest and to mammograms, and has been operative since 1995. MedLEE has been extended to all of radiology, and also to pathology, echocardiograms, electrocardiograms, and discharge summaries. MedLEE has been independently evaluated numerous times and has been shown to perform effectively for clinical applications.

Another area of my NLP research involves the biomolecular domain. An NLP system GENIES has been developed for the genomics domain and is based on an adaptation of MedLEE. GENIES automatically acquires knowledge by extracting and structuring biomolecular relations from the literature. The goal is to use the information as part of a tool to assist in genomics research. GENIES has undergone a formative evaluation and the results were promising. Continued development will address numerous interesting research issues. The most exciting research challenge will involve furthering drug discovery and the understanding of genetic causes of diseases by linking information in the clinical patient record to genomic information obtained from the literature.

Another area of research involves developing methods to assist in discovering clinical vocabulary terms in a domain by using a large corpus of clinical reports of the domain. Natural language tools can be used to ellicit common multi-word phrases in the domain, to discover the composition and frequency of complex clinical terms, and to discover modifiers and qualifiers of clinical terms..

A third area of research involves knowledge representation, because the clinical information in the reports must be represented using a well-defined structure and well-defined symbolic terms. Knowledge representation issues involve balancing opposing requirements: completeness of expression and ease of access to the data once it has been structured and encoded.

Current Grant Support

Publications

Honors and Awards


If you have questions you would like Carol Friedman to respond to, please send e-mail to:


Department of Computer Science, Queens College CUNY, Flushing, NY
Department of Medical Informatics, Columbia University, NY, NY