Conference Coverage

Using Natural Language Processing in Radiology Reports to Identify the Presence of Metastatic Disease in Veterans With Prostate Cancer

Abstract 9: 2017 AVAHO Meeting


 

Background: Radiographic imaging is important for the diagnosis and management of cancer. Radiology reports contain a wealth of information, but are typically formatted as unstructured text, making large scale information extraction challenging. We validated a natural language processing (NLP) algorithm to identify the presence of metastatic disease in radiographic imaging reports.

Methods: Using VA Clinical Cancer Registry and Corporate Data Warehouse, we identified approximately 3 million radiology reports for 120,374 patients receiving care for prostate cancer in the VA from 2006-2015. We focused on the impression section of CT, PET/CT, X-ray, bone scan, and MRI reports. We expanded on Chapman et al. “ConText” algorithm to identify the presence of metastatic disease: (1) Using UMLS, we identified terms compatible with “metastasis”; (2) Report impressions were preprocessed and tokenized at the sentence level and as part of the sentence; (3) Positive and negative trigger phrases were implemented as a series of regular expressions, which were refined over a number of iterations using training data from 2 batches of 600 reports, allowing us to extend trigger identification to a larger set of phrases. The final algorithm was validated using an independent sample of 2,000 reports annotated by a domain expert.

Results: The first training set of 600 of radiology reports achieved an accuracy of: 94% for reports with no mention of metastasis, 85% for negated mention of metastasis, and 74% mentions of metastasis without negation. Errors were reviewed resulting in vocabulary expansion and improved implementation of regular expressions to capture the expanded trigger phrases. Performance of the modified algorithm was tested on a new set of 600 reports and resulted in an increased accuracy of 96% for no mention of metastasis, 90% for negated mention of metastasis, and 89% mentions of metastasis without negation. After additional modifications were made, the revised algorithm was validated using an independent sample of 2,000 reports. The accuracy was 96% (Cohen’s kappa ~1), with precision of 98%, and a sensitivity of 98%.

Conclusions: Detecting presence of metastatic disease from radiographic notes is feasible with NLP.

References: (1) Sarkar S, Das S. A review of imaging methods for prostate cancer detection. Biomed Eng Comput Biol. 2016;7(Suppl 1):1-15. (2) Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301- 310. (3) Harkema H, Dowling JN, Thornblade T. Con-Text: An algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform. 2009;42(5):839-851.

Next Article: