Natural Language Processing for Biosurveillance
Information described in electronic clinical reports can be useful for both detection and characterization of outbreaks. However, the information is in unstructured, free-text format and is not available to computerized applications. Natural Language proc
- PDF / 799,627 Bytes
- 32 Pages / 439.37 x 666.142 pts Page_size
- 29 Downloads / 299 Views
CHAPTER OVERVIEW Information described in electronic clinical reports can be useful for both detection and characterization of outbreaks. However, the information is in unstructured, free-text format and is not available to computerized applications. Natural language processing methods structure free-text information by classifying, extracting, and encoding details from the text. We provide a brief description of the types of natural language processing techniques that have been applied to the domain of outbreak detection and characterization. We group textual data generated by a healthcare visit into four classes: chief complaints, emergency care notes, hospitalization notes, and discharge reports. For each class of data, we illustrate uses of the data for outbreak detection and characterization with examples from real applications that extract information from text. We conclude that a modest but solid foundation has been laid for natural language processing of clinical text for the purpose of biosurveillance, with the main focus being on chief complaints. To provide more accurate detection and to assist in investigating and characterizing outbreaks that have already been detected, future research should focus on tools for extracting detailed clinical and epidemiological variables from clinical 1*
2
Department of Biomedical Informatics, University of Pittsburgh, 200 Meyran Avenue, Pittsburgh, PA 15260, USA, [email protected] Division of Epidemiology, School of Medicine, University of Utah, 30 North 1900 East, AC230A, Salt Lake City, UT 84132, USA
D. Zeng et al. (eds.), Infectious Disease Informatics and Biosurveillance, Integrated Series in Information Systems 27, DOI 10.1007/978-1-4419-6892-0_13, © Springer Science+Business Media, LLC 2011
279
280
Chapter 13
reports. Areas of challenge include identifying contextual modifiers of clinical conditions, identifying useful temporal information, and integrating information across reports for a more complete view of a patient’s clinical state. Keywords: Natural language processing; Biosurveillance; Syndromic surveillance; Text processing; Information extraction; Infectious disease
1.
INTRODUCTION
It would be so nice if something made sense for a change. Alice in Alice in Wonderland by Lewis Carroll This quote from another century sums up the challenges faced by those working in text processing of medical data. Although the “sense” – context and details of a medical note describing a patient’s visit and illness – are readily apparent to a trained human reader, training computers to understand the same information is a daunting task. Consider a situation in which a novel strain of avian influenza (H5N1) with pandemic potential is causing outbreaks of disease among domestic and commercial poultry in many parts of the world with sporadic transmission to humans. Public health agencies and medical personnel everywhere including the U.S. are concerned about the first imported case of avian influenza that would most likely go undetected. Identifying subsequen
Data Loading...