From the Journals

AI mammogram screening is equivalent to human readers


 

FROM RADIOLOGY

With the advent of artificial intelligence (AI), the era of double reading of mammograms is likely coming to a close, according to Liane Philpotts, MD, a radiology and biomedical imaging professor at Yale University in New Haven, Conn.

The reason is because AI is proving to be as good as humans in interpreting mammograms, at least in the research setting.

In one of the latest reports, published online in Radiology, British investigators found that the performance of a commercially available AI system (INSIGHT MMG version 1.1.7.1 – Lunit) was essentially equivalent to over 500 specialized readers. The results are in line with other recent AI studies.

Double reading – having mammograms read by two clinicians to increase cancer detection rates – is common in the United Kingdom and elsewhere in Europe.

The British team compared the performance of 552 readers with Lunit’s AI program on the Personal Performance in Mammographic Screening exam, a quality assurance test which mammogram readers in the United Kingdom are required to take twice a year. Readers assign a malignancy score to 60 challenging cases, a mix of normal breasts and breasts with benign and cancerous lesions. The study included two test sessions for a total of 120 breast screenings.

Fifty-seven percent of the readers in the study were board-certified radiologists, 37% were radiographers, and 6% were breast clinicians. Each read at least 5,000 mammograms a year.

There was no difference in overall performance between the AI program and the human readers (AUC 0.93 vs. 0.88, P = .15).

Commenting in an editorial published with the investigation, Dr. Philpotts said the results “suggest that AI could confidently act as a second reader to decrease workloads.”

As for the United States, where double reading is generally not done, she pointed out that “many U.S. radiologists interpreting mammograms are nonspecialized and do not read high volumes of mammograms. Thus, the AI system evaluated in the study “could be used as a supplemental tool to aid the performance of readers in the United States or in other countries where screening programs use a single reading.”

There was also no difference in sensitivity between AI and human readers (84% vs. 90%, P = .34), but the AI algorithm had a higher specificity (89% vs. 76%, P = .003).

Using AI recall scores that matched the average human reader performance (90% sensitivity, 76% specificity), there was no difference with AI in regard to sensitivity (91%, P = .73) or specificity (77%, P = .85), but the investigators noted the power of the analysis was limited.

Overall, “diagnostic performance of AI was comparable with that of the average human reader.” It seems “increasingly likely that AI will eventually play a part in the interpretation of screening mammograms,” said investigators led by Yan Chen, PhD, of the Nottingham Breast Institute in England.

“That the AI system was able to match the performance of the average reader in this specialized group of mammogram readers indicates the robustness of this AI algorithm,” Dr. Philpotts said.

However, there are some caveats.

For one, the system was designed for 2D mammography, the current standard of care in the United Kingdom, while digital breast tomosynthesis (DBT) is replacing 2D mammography in the United States.

In the United States, “AI algorithms specific to DBT are necessary and will need to be reliable and reproducible to be embraced by radiologists,” Dr. Philpotts said.

Also in the United Kingdom, screening is performed at 3-year intervals in women aged 50-70 years old, which means that the study population was enriched for older women with less-dense breasts. Screening generally starts earlier in the United States and includes premenopausal women with denser breasts.

A recent study from Korea, where many women have dense breasts, found that 2D mammography and supplementary ultrasound outperformed AI for cancer detection.

“This underscores the challenges of finding cancers in dense breasts, which plague both radiologists and AI alike, and provides evidence that breast density is an important factor to consider when evaluating AI performance,” Dr. Philpotts said.

The work was funded by Lunit, the maker of the AI program used in the study. The investigators and Dr. Philpotts had no disclosures.

Next Article: