AI-Supported Mammography Screen Readings Demonstrate Accuracy in MASAI Trial
Safety analysis shows AI-assisted screenings identified similar cancer rates while halving radiologist workload
08/01/2023
John McKenna, Associate Editor, BreakingMED™
Vandana G. Abramson, MD, Associate Professor of Medicine, Vanderbilt University Medical Center
Using an AI tool to triage mammography screening exams to single or double reading led to similar rates of cancer detection, recall, and false positives compared to standard double screen reading, according to a pre-specified safety analysis of data from the ongoing MASAI trial.
In this analysis, AI-assisted mammography screen reading reduced the screen-reading workload by almost half compared to standard double screen reading.
Using an AI system to support mammography screening led to similar rates of screen-detected breast cancers compared to traditional double screen reading and dramatically cut screen-reading workload, without raising rates of recalls, false positives, or consensus meetings, according to findings from a pre-specified safety analysis of the Mammography Screening with Artificial Intelligence (MASAI) study.
These pitfalls warrant establishing "a more efficient and effective mammography screening program," Lång and colleagues suggested.
Recently, image analysis tools based on artificial intelligence (AI) have have demonstrated possible benefits for mammography screening, such as "facilitating triage of screening examinations according to risk of malignancy or supporting detection with computer-aided detection (CAD) marks highlighting suspicious findings," they wrote. Research suggests that such tools can achieve similar or better accuracy to that of breast radiologists, as well as classify screening examinations as high risk before a diagnosis of interval cancer, potentially reducing false negative screening results when used as detection support alongside radiologist-based screen reading.
"Taken together, the evidence suggests that use of AI could potentially benefit mammography screening by reducing the screen-reading workload and the number of interval cancers, but randomized trials are needed to assess the efficacy of AI-supported screening," they wrote.
Thus, the randomized, controlled MASAI trial was designed to investigate an AI-supported screen-reading consisting of triage of screening exams to single or double reading, along with detection support. The current safety analysis set out to determine the effect of AI-supported screening compared with standard double reading on cancer detection, recalls, false positives, positive predictive value (PPV) of recalls, and screen-reading workload.
The MASAI trial was performed within the Swedish national screening program and participants were recruited from four screening sites—Malmö, Lund, Landskrona, and Trelleborg. Women ages 40-80 years of age who were eligible for mammography screening (general screening with 1.5–2 year intervals; annual screening for those with moderate hereditary risk of breast cancer or history of breast cancer) were invited to participate and randomized 1:1 to either AI-supported screening (intervention group) or standard double reading with no AI (control group).
Standard screening exams included two views per breast. Exams randomized to the intervention group were analyzed using an AI-system that utilizes deep learning "to identify and interpret mammographic regions suspicious for cancer," they explained. The AI then provided an exam-based malignancy risk score on a scale from 1 to 10—risk scores 1-7 were considered low risk, 8 and 9 intermediate risk, and 10 high risk. It also provided CAD marks "at suspicious regional findings of calcifications and soft-tissue lesions" for patients deemed at intermediate or high risk. All screenings were conducted at the Unilabs Mammography Unit at Skåne University Hospital in Malmö.
In the intervention group, exams with risk scores of 1-9 underwent single reading, while exams deemed high risk underwent double reading by two different breast radiologists; the second reader had access to the first reader’s assessment. In the control group, screening exams were read with standard unblinded double reading without AI.
While the MASAI trial’s primary outcome is interval cancer rate, the safety analysis focused on secondary outcomes: early screening performance of cancer detection rate (number of cancers detected per 1,000 participants screened), recall rate (proportion of screened participants who were recalled), false positive rate, PPV of recall, type of detected cancer (invasive or in situ), and screen-reading workload.
A total of 80,033 women were randomly assigned to either the intervention group (n=40,003) or control (n=40,030) from April 12, 2021, through July 28, 2022. Median patient age was 54 years.
Among the findings:
- "AI-supported screening among 39,996 participants resulted in 244 screen-detected cancers, 861 recalls, and a total of 46,345 screen readings. Standard screening among 40,024 participants resulted in 203 screen-detected cancers, 817 recalls, and a total of 83,231 screen readings.
- "Cancer detection rates were 6.1 (95% CI, 5.4–6.9) per 1,000 screened participants in the intervention group, above the lowest acceptable limit for safety, and 5.1 (4.4–5.8) per 1,000 in the control group—a ratio of 1.2 (95% CI, 1.0–1.5; P=0.052).
- "Recall rates were 2.2% (95% CI, 2.0–2.3) in the intervention group and 2.0% (1.9–2.2) in the control group.
- "The false positive rate was 1.5% (95% CI, 1.4–1.7) in both groups.
- "The PPV of recall was 28.3% (95% CI, 25.3–31.5) in the intervention group and 24.8% (21.9–28.0) in the control group.
- "In the intervention group, 184 (75%) of 244 cancers detected were invasive and 60 (25%) were in situ; in the control group, 165 (81%) of 203 cancers were invasive and 38 (19%) were in situ.
- "The screen-reading workload was reduced by 44.3% using AI."
With this safety data in hand, Lång and colleagues will proceed with the primary MASAI analysis, which will assess the primary endpoint of interval cancer rate among 100,000 enrolled participants after 2 years of follow-up.
In an editorial accompanying the study, Nereo Segnan, and Antonio Ponti, both of CPO Piemonte in Torino, Italy, noted that while the primary outcome of the MASAI trial has yet to be presented, the results of the current analysis "are of great interest…If confirmed, the value of these results is not only in saving resources through reducing the number of second readings, but also in identifying a low-risk group (score 1–7), which in this study accounted for 76.2% of the target population (30,464 of 39,996) and 2.5% of cancers (six of 244)."
The risk score used by Lång et al "could be used in regularly screened populations to study the transition rates and the time intervals from one assigned risk level to another and to define time intervals between screening episodes according to the rate (speed) of transition," they added. "In other words, in personalized screening regimens, fast-growing and slow-growing lesions could be identified and separated from one another. When collected for an appropriate period of follow-up, cases of interval cancer, stratified by AI risk, could confirm and further increase the accuracy of the score."
That said, Segnan and Ponti also noted that the possibility of overdiagnosis or over detection of indolent lesions, "such as a relevant portion of ductal carcinomas in situ," should prompt caution in interpreting these findings.
"In the trial, compared with standard of care, AI identified a disproportionately, although not significantly, higher percentage of in situ carcinoma among screen-detected cancers: 25% of all cancers in the intervention group versus 19% in the control group," they explained. "There has been decades of debate on whether detection of in situ carcinomas, especially those classified as low grade, is beneficial or harmful in breast cancer screening. Reasons for this debate include that in situ carcinoma has shown a notable increase in incidence over time, a greater geographical variation in incidence compared with invasive cancer, and a substantial variation in treatment approaches, which often inappropriately includes mastectomy or the removal of axillary lymph nodes."
They concluded that a significant research question yet remains: "[I]s AI, when appropriately trained, able to capture relevant biological features—or, in other words, the natural history of the disease—such as the capacity of tumors to grow and disseminate?"
Study limitations cited by Lång et al included that screen readings were performed at a single site and only relied on a single mammography device and one AI system; the participating radiologists were all moderately to highly experienced in breast imaging, which could limit generalizability; only readers with over 2 years of experience were allowed to conduct single reading in the intervention group, which could have biased reader performance in the control group; and determining a true false positive rate required further follow-up in case of later interval cancer diagnosis.
Disclosures
The MASAI trial is funded by the Swedish Cancer Society, the Confederation of Regional Cancer Centers, and Swedish governmental funding.
Lång reported serving on an advisory board for Siemens Healthineers and receiving lecture honorarium from AstraZeneca; coauthor Hofvind is head of BreastScreen Norway at the Cancer Registry of Norway, which has a research agreement with Screenpoint Medical.
Segnan and Ponti had no relevant relationships to disclose.
Sources
Lång K, et al "Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): A clinical safety analysis of a randomised, controlled, non-inferiority, single- blinded, screening accuracy study" Lancet Oncol 2023; 24: 936–944.
Segnan N, Ponti A "Artificial intelligence for breast cancer screening: Breathtaking results and a word of caution" Lancet Oncol 2023; 24: 830-832.