MEDical Language Extraction and Encoding System
|
Introduction |
MedLEE
is a Natural Language Processing application developed and in production at The New York Presbyterian Hospital. It was designed in the early 1990s to automatically encode text reports from the Department of Radiology. Much information in the medical domain is encoded as free text. Though useful for human readers it is difficult, if not impossible for databases to utilize the information effectively.Overview of NLP.
Natural Language Processing (NLP) is the intersection of the disciplines of Computer Science and Linguistics. The way in which humans encode information in the languages developed over time is difficult to describe and therefore it is difficult to create computer programs that can decipher information in natural language. Written human language is generally considered to contains four types of information: morphological, syntactic, semantic, and discourse. Languages follow a grammar that allows some combinations and structures of information and disallow others. NLP systems seek formalize the grammar rules of a language algorithmically to extract information from natural texts. Because of the complexity of the entire set of grammar rules for a given language it is currently impractical to create useful language processors for general texts. However, it is possible to take advantage of domain knowledge to define a sublanguage. Sublanguages have more limited grammar rules and vocabulary than full languages.NLP in the Medical Setting.
NLP has many practical domains, but medicine, a domain where free text is a dominant method of information collection and recording, is particularly important as the information can benefit individual healthcare. The New York Presbyterian Hospital's Department of Medical Informatics currently is involved in research revolving around the Clinical Information System (CIS). One issue that needs to be resolved is how free text should be represented in the Hospital's databases. It would be difficult and expensive to replace text encoded data with a more database-friendly encoding scheme and medical workers have been very reluctant to do away with hand-written or typed notes. However, much of the information in these notes could be utilized by the CIS for decision support, data review, and other medical activities. MedLEE was created to create a bridge between free text reports and the CIS