Conference Coverage

The Clinical Lab Information Retrieval (CLIR) Framework—An R Framework for CDW Clinical Lab Data Extraction and Retrieval

Abstract 42: 2016 AVAHO Meeting


 

Purpose: Extract, retrieve, and validate clinical lab information from the VA Corporate Data Warehouse (CDW).

Background: CDW clinical lab information provide a unique opportunity to assess real world cancer treatment effectiveness and safety with higher granularity and validity compared to administrative data. Unfortunately, there is significant heterogeneity in how this information is encoded across time and geography. Various efforts have been made to clean these data and provide a consistent and reliable mapping; however, the availability and validity of these efforts also vary across lab concepts. This presents a significant barrier to utilization of CDW clinical lab information in comparative effectiveness research.

Methods: We defined a conceptual framework for retrieval of lab information 5 features: Logical Observation Identifiers Names and Codes (LOINC) codes, test names, topography, unit, and unit reference ranges. This was then implemented as a framework in R comprised of 7 discrete modules. Each module corresponds to a defined task in the conceptual framework: Concept -> LOINC/test name -> cleaned LOINC/test name -> LOINC/test name internal identifier -> fact information retrieval -> topography selection -> unit and reference range cleaning and harmonization. Each module has a defined input and output allowing implementation transparency, reproducibility, and flexibility.

Results: Using the CLIR framework, we retrieved peripheral blood total white count of patients with hematologic malignancies. In a cohort of about 300,000 patients diagnosed and or treated for a hematologic malignancy in the VHA between 2001-2016, we identified ~ 11x10^6 potential total WBC count based on LOINC codes and lab test name. Of those, ~ 9x106 were mappable to the correct topography, and the overwhelming majority of which (99%) were mappable to a harmonized unit and reference range.

Conclusion: The CLIR framework provides a conceptual framework and an implementation in R for clinical lab information retrieval from the VA CDW. Future efforts will entail refining the methodology across multiple data domains and comparing CLIR output with other ongoing efforts aimed at cleaning and harmonization of clinical lab data in the CDW.

Next Article: