Lifetime Risks are Lame – Let Me Count the Ways

Along with the 2007 announcement from the American Cancer Society proclaiming that women at high risk for breast cancer should be screened with MRI came an unfortunate by-product – the golden calf of LIFETIME RISKS.

My choice of “lame” is not without some thought, the implication being that one can still walk if lame, but it can be a struggle.

The often-ignored definition of risk assessment is the calculation of “absolute risk over a defined period of time.” The problem is: “lifetime risk” is poorly defined. There is a world of difference between cumulative lifetime risk (what many conceptualize) and remaining lifetime risk (what the models actually calculate).

In addition, the certainty about the power and persistence of risk grows less and less over time. Many of our lifetime risks are simply linear extensions of shorter studies. For instance, the Tyrer-Cuzick model will calculate almost 70% lifetime risk when a 35 year-old is diagnosed with LCIS, but most follow-up studies are limited to 20 or 30 years. No cohort of LCIS patients has been published with a mean follow-up of 50 years, which is what the T-C model is calculating in a 35 year-old.

Then there is the question about calculating lifetime risks to identify patients to apply screening technology that is unlikely to be in use in 50 years. And the bigger question is whether or not screening for breast cancer will be required at all in 50 years. Eventually, with an effective “cure” for all stages of disease, there will be no need to screen.

Lifetime risk calculations are not without serious hazard. By generating high numbers (esp. to get insurance coverage for MRI screening), there are some women who will “jump ship” and move ahead toward preventive mastectomies based on inflated figures. Lifetime risks have the curious tendency of “piling up” and weighing the patient down as if the entire load is going to come spilling down on her “any day now.” The addition of SNPs to risk modeling has the potential to make that scenario more common, with inflated values for risk that have never been prospectively validated after consolidation (grouping individual SNP risks into a whole).

I was challenged to learn more about risk assessment in 1991 by Dr. David Page who cautioned back then: “Limit risk assessment to a 20-year maximum calculation” (for all the reasons above). It took many years to fully understand this sage advice, now applicable to my longstanding criticism of our MRI screening guidelines.

Admittedly, it was a stretch for the American Cancer Society (ACS) to endorse MRI without mortality reduction data (and with relatively small studies), but they correctly understood that proof of a mortality reduction by adding MRI to mammography would be difficult to come by within a reasonable time frame. Here we are, 11 years later, still with no mortality reduction data for MRI screening, a benefit that will only be confirmed through a prospective, randomized trial.

At the same time, there was incontrovertible evidence that MRI was detecting many more cancers than mammography (double to triple the number). Without confirmation of a mortality reduction for MRI, the next best thing is the surrogate of Sensitivity. (Specificity only addresses the practical aspects of screening, not mortality reduction).

Invoking deductive reasoning, the syllogism works something like this:

Major Premise: Early detection with mammography reduces breast cancer mortality by 20-30% with only 40% Sensitivity, indicating that breast cancer biology is quite vulnerable to early detection.

Minor Premise: Screening MRI has a 90% Sensitivity, and Sensitivity and Biology are the only variables involved in mortality reduction.

Conclusion: Screening MRI will result in a mortality reduction well in excess of what is achieved with mammography alone.

 

Had there not been the foundation of a proven mortality reduction with screening mammography, then a proposal to screen with MRI would have floundered. And if you winced while reading the “40% Sensitivity” for mammography, I did not pull that out of thin air. In fact, that’s the sensitivity level for mammography when compared to MRI in the combined analysis of 5 international MRI screening trials. (Sardanelli F, Podo F. Eur Radiol 2007;17:873-887). If you don’t like this 40%, the ACS weighed in as well, basing their recommendations on 6 international trials (the 5 above plus one more) wherein mammographic sensitivity ranged from 16% to 40% (Saslow D, et al. CA Cancer J Clin 2007; 57:75-89).

So, going out on a long limb, the ACS opted to create guidelines for patient selection that approximated how 6 international MRI screening trials had been designed. And that’s when the problems began. Relying on the strategies used for patient inclusion in those trials laid a faulty foundation. The MRI screening trials were not focused on ideal patient selection, but on proving the benefit of MRI. Lifetime risk was the norm for all studies. This skews the experience to favor younger women where lifetime risks are higher, based on the key word – remaining.

All our mathematical models calculate remaining lifetime risk, not total cumulative lifetime risk. As we age, we “pass through” our various risks, until we finally meet up with that 100% risk of death, wherein the remaining lifetime risk for any new disease is finally 0. Thus, lifetime risks DECLINE over time, while short-term breast cancer incidence INCREASES over time.

By focusing exclusively on empirical data to the exclusion of rational thought, we got our risk strategies perfectly backwards. Consider tamoxifen prevention. Entry requirements for the NSABP P-01 trial were based on short-term risks even though the effect of tamoxifen is durable over the long-term. Here, we should be using 20-year risk calculations, but by sticking to guidelines that duplicate P-01, we use 5-year risks primarily. In contrast, with MRI screening – where we want to know the probability of a mammographically occult cancer in the short-term, specifically on a given day – we use long-term risks that lamely reflect the chance of a cancer being found on an MRI in the short-term.

Granted, when we use lifetime risks, we are increasing cancer detection rates (CDRs) over the long term due to the higher rate of disease incidence. And if we were using MRI to screen only BRCA-positive patients, a substantial difference in CDRs would exist. But when we move down to the 35 year-old at 21% lifetime risk for breast cancer vs. the 35 year-old at 12% general population risk, the difference between CDRs over the next 20 to 30 years is negligible. This is what keeps “precision screening” from being precise. When you convert risk levels to the actual differences in yield, there’s really not that much difference between high risk and normal risk, with the exception of patients at “very high” risk.

Even the initial prevalence screen in high risk vs. baseline risk does not generate as much difference in CDR as one might expect. Check out the prevalence screen data from Dr. Christiane Kuhl’s MRI screening study in the general population (Radiology 2017; 283:361-370) – it’s a comparable CDR on that first screen (22.6 per 1,000) to what one finds in the high-risk international screening trials (i.e., 22, 22, 23, 29, 30 and 36 per 1,000).

Returning to the main reason lifetime risks bomb when put into practical use – the impact of age on remaining lifetime risk – let’s walk through the two different ways in which “lifetime risk” can be conceptualized:

The oft-quoted “12% lifetime” is total risk over the course of an entire lifetime, from birth to age 90 or beyond. This is a cumulative lifetime risk. (And if we take away those women with known risk factors for breast cancer, it’s not 12% — but more like 7% to 8%.) However, this is NOT what the mathematical models calculate. All models calculate remaining lifetime risk, which is directly related to the patient’s age, that is, the remaining number of years anticipated.

 

powerpoint #1

 

In the diagram above, the top graph demonstrates how we tend to conceptualize lifetime risks where the solid line indicates lifetime cumulative risk for the general population (12%), starting at age 20, with the patient icon at the end of that long accumulation. The dotted line below the solid one represents the “lifetime risk” that a 60 year-old is facing for her remaining years (7%). These cumulative lifetime risk graphs are deceiving in that one senses a persistent rise in risk over time. In reality, however, the lifetime risk as viewed with the patient icon looking forward in time (bottom graph), reveals that remaining lifetime risk is actually declining.

But that’s only the first step in clearing the confusion. Once you realize we’re talking about a declining number over time, how do you reconcile increasing short-term incidence for breast cancer to a peak age at 55-60? I fashioned the diagram below for my book – Mammography and Early Breast Cancer Detection: How Screening Saves Lives (McFarland, 2016) – trying to illustrate why we have the paradoxical situation of high short-term risk (in terms of rate/100,000) in the face of declining lifetime risk. That is, lifetime risks are going down, while short-term incidence is rising.

Powerpoint #2

 

To try and explain this paradox, I used two different y-axes. The dotted lines related to the y-axis on the left represent remaining lifetime risk, the top line for a high-risk patient, the bottom line for a baseline risk patient. The solid line relates to short-term incidence with the y-axis on the right. Since there are different units of measurement for the 2 y-axes, the design was created for illustrative purposes to make the point that these two oft-quoted numbers are paradoxically at odds with each other. As one ages, their remaining lifetime risk is in constant decline, whereas the short-term incidence peaks around 55-60, then slightly declines.

As a result of this paradox, the use of lifetime risks is highly discriminatory to the older age groups (and don’t forget the net benefit of mammographic screening, in general, is found from ages 60 to 69). A young woman with risk factors can easily qualify for MRI even though her short-term probabilities of breast cancer might be low, while the older woman with the same risks and high short-term probability fails to meet the Golden Calf standard of “20% or greater lifetime risk,” according to American Cancer Society guidelines (with NCCN and others joining in later using very similar guidelines, endorsing the 20% threshold).

I’ve used the following example of age consequences and discrimination since 2007 in multiple publications and presentations, and it still holds true today:

Powerpoint #3

 

When it comes to screening MRI, the question we’re asking with selective screening is “How can we maximize cancer detection rates (CDRs) to make this cost-effective? These CDRs are directly related to disease prevalence and incidence in the screened population, and it has been the specious conclusion since 2007 that the best way to do this is through “remaining lifetime risk.” But look what happens in the example above, where the 30 y/o easily qualifies for MRI, but her risk over the next 10 years is only 3.5% (Claus model). Because the 60 y/o with the same risk factors is discriminated against through the use of lifetime risks, she fails to qualify for breast MRI even though her chance of having a mammographically occult breast cancer over the next 10 years is TRIPLE the patient who does qualify for MRI.

And in another twist of the same principle, if a 30 year-old has no risk factors other than a biopsy showing ordinary hyperplasia, the T-C model will calculate a 23% lifetime risk, which is based on 55 years of remaining risk. So this 30 y/o qualifies for MRI, while our 60 y/o in the example above with two first-degree, premenopausal relatives with breast cancer does NOT qualify? And we’ve lived with this since 2007?

In the last scenario on the Powerpoint slide above, if we look at a 60 y/o patient with NO risk factors, her 10-year risk is nearly identical to our “very high risk” 30 year-old. Yet, try to order a screening MRI on a 60 y/o with no risk factors and then watch the brouhaha that follows. Remarkably, the ACS guidelines include the specific admonition that MRI is “not recommended” for women with lifetime risk under 15% — which, through age discrimination, excludes many women with occult cancers at a short-term rate higher than younger patients who qualify for MRI. An active statement against MRI screening based on lifetime risks is emblematic of a serious misunderstanding of the long-term-short-term paradox described above.

If you ever wondered why the NSABP P-01 trial for tamoxifen prevention used “risk of a 60 y/o woman without other risks” as their threshold for inclusion – in effect, turning an average patient into a “high risk” patient – it’s because of the paradox noted above, wherein the NSABP needed quick answers so they focused on short-term incidence rather than remaining lifetime risks. In contrast, the MRI studies and subsequent guidelines did just the opposite, focusing on long-term risk to the exclusion of short-term incidence.

Here’s yet another variation on how age discrimination is an inherent feature of remaining lifetime risk: At the individual level, a woman who barely qualifies for MRI at a young age will be unqualified (or disqualified) later, perhaps within a mere 5 years. Remember, remaining lifetime risks decline over time. And if you’re not updating previously calculated risk every 5 years or so, then you’re quoting a number higher than reality allows. 100% of our patients have declining lifetime risks, and it takes some effort to recalculate those risks periodically. I try to do this every 5 years, though I still encounter patients whose risk calculation is fossilized.

Example: Take the typical patient with one first-degree relative with breast cancer, her mother diagnosed with breast cancer at age 60. When the patient is age 40, the Tyrer-Cuzick model will calculate a 22% lifetime risk for breast cancer, qualifying her for screening MRI. But as time goes on, and the patient reaches age 55, just at the point when short-term incidence is peaking (and closing in on her mother’s age when diagnosed), she has passed through enough risk that the T-C model now calculates 18%. Too bad. No MRI.

In the ACS publication that announced the 2007 guidelines, Table 4 shows the fairly wide variation from one model to the next, using 5 different risk scenarios applied to the “preferred models” as advised by the American Cancer Society (BRCAPRO, Claus, Tyrer-Cuzick). The variation is concerning, yes, but here’s the kicker – All 5 clinical scenarios begin with the patient (or proband) being 35 years old. Why wasn’t there a Table that showed the much wider variation imparted by different age groups? Did the authors consider the difference between “cumulative” and “remaining?” Or, were they so fixated on the starting age for MRI screening, they forgot that many women enter high-risk programs at 50 or older, when the incidence peaks (and where it is harder to qualify for MRI)?

How difficult is it to fix the problem? Even though countless women have been denied breast MRI for the past 11 years due to age discrimination imparted by the faulty guidelines, the “fix” is so simple as to defy logic as to why it has not been done already (new guidelines are due any day now, I’m told). You simply add the option of a short-term risk calculation in addition to the lifetime option. Introduce a 5-year risk number and the problem is fixed (unless the threshold is too high).

As it stands now, we have patients qualifying for MRI but not SERM risk reduction, while others qualify for SERM risk reduction but not MRI. That makes no sense. “Here, take this pill every day for 5 years after reviewing the long list of side effects, including death from DVT….but sorry, you’re not at high enough risk to qualify for MRI screening.”

The trial design of ACRIN 6666 indicates that there are “some out there” who understand that CDRs are boosted through the use of both short-term and long-term risks, the former designed for older women, while the latter for younger women. It’s the only way to handle the paradox of short-term vs. long-term risk. In the ACRIN 6666 trial of screening ultrasound (with a subgroup also getting screened with a single MRI), patients had to have breast density (a complex definition) as well as a single risk factor in addition to the density. The single risk could be a Gail calculation of lifetime risk…..or a 5-year Gail calculation.

Trial design took place prior to widespread adoption of the T-C model, but the rationale used displayed a deep understanding of how to improve yields based on equitable criteria. For instance, the requisite degree of risk was lessened as density increased. Think about it. Two parameters – risk and density – intimately bound when the endgame is the probability of a mammographically occult cancer. Like the high risk patient, the high density patient is more apt to have a mammographically occult cancer than a low density patient. So when the density level is higher, the requisite risk level was relaxed in ACRIN 6666. It’s rational and insightful.

We are doing something similar at Mercy Breast Center in OKC in a NCI-funded study of 4,000 normal mammograms (NCI R01CA197150), using a computer analysis system with machine learning that converts density patterns to a Risk Score, comparing left to right, and year-over-year, a computer program developed by Drs. Bin Zheng and Hong Liu at the University of Oklahoma Advanced Cancer Imaging Lab in Norman. We use a sliding scale, wherein a Risk Score of 0.80 prompts a screening MRI, but if density is Level C, then a score of 0.75 qualifies, and for Level D, a score of 0.70 qualifies.

When the 2007 guidelines for MRI screening were released, there were so many inconsistencies and oddities, I assumed that corrections and modifications would be prompt. That has not been the case. As noted above, the guidelines were largely dependent on the international trials where the focus was on MRI performance, not risk assessment strategies. In fact, if you read the inclusion criteria of the international MRI screening trials, it’s not always clear in some of the studies how patients were selected.

In our 2014 publication that challenged current guidelines for MRI screening (The Breast Journal 2014; 20:192-197), we performed risk calculations using Gail, Claus, and Tyrer-Cuzick on all patients who had their cancer discovered through routine asymptomatic screening with MRI. Most of our MRI discoveries would have never happened had we relied on the ACS guidelines due to the fact that we had incorporated breast density levels into patient selection. This point system was first proposed prior to the 2007 ACS guidelines, although not in print until 2008 (Hollingsworth AB, Stough RG. Breast MRI screening for high risk patients. Semin Breast Dis 2008; 11:67-75.)

At the time when we introduced our point system (2008 in print, 2004 first use), we had only diagnosed 7 patients with MRI screening. None were identified with the Gail model, none with Claus, and only 3 of 7 with Tyrer-Cuzick. The impact of using breast density as an equal parameter to calculated risk was apparent to us early on. Our strategy also took into account the wide variation in the different models (by not relying on their illusion of certainty) and avoiding the paradox of declining lifetime risk in the face of rising incidence by simply placing patients in one of 3 risk level categories: Baseline Risk, High Risk and Very High Risk.

This might seem reactionary, but I’ve never fully embraced the mathematical models. Why? Their merging of risk factors is based on accepted statistical modeling applicable to industry in general, but with no accounting for the biologic interaction between risks. As such, the Gail told us that Atypical Hyperplasia and Family History were synergistic, consistent with the original work of Page and Dupont. But then (with the blessing of Dupont) the Mayo Clinic data now indicates that family history contributes nothing in this situation – the risk of Atypical Hyperplasia trumps family history and imparts the same level of absolute risk regardless of other factors. On a bigger scale, this is biology trumping mathematics.

And to support my skepticism of the mathematical models, look at the c-stats for the various models that reflect “discrimination” at the individual level. They’re not pretty. Better than flipping a coin, but nothing to brag about and well below predictive models in other diseases. The c-stats are comparable to “accuracy” in the statistical sense, and while the original Gail had an embarrassing 0.58, we’re still not above 0.70 with the latest and greatest models. Yes, you might see the word “excellent” associated with various models, but they will be talking about “calibration,” not discrimination. Calibration is the predicted-to-observed ratio, that is, how many cancers will develop in a cohort. And therein lies the unequivocal benefit of mathematical modeling – that is, in the design of clinical trials where investigators need to predict the number of breast cancers that will occur. But at the individual level (discrimination), not so good.

It is probably no surprise that I still like my original scoring system more than any other option. With 4 levels of density and 3 levels of risk, we generated a total that converted to a recommendation of 1) annual MRI, 2) biennial MRI, 3) triennial MRI, or 4) No MRI. In this model, patient age has no impact at all on selection for MRI. As a result, the age distribution in our MRI-discovered cancers closely reflects age-at-diagnosis in the mammographically screened population – that is, 80% of our MRI-discovered cancers are in patients over the age of 50.

Powerpoint #4

By the time of our 2014 publication, we had diagnosed 33 patients with MRI screening. Had we used the Gail model, only 9 of 33 cancers would have been discovered. Had we used the Tyrer-Cuzick model (where calculations are consistently higher), still only 12 of 33 cancers would have been discovered. And poor Claus – originally, the “preferred” model as it was the only model used in the international trials (of the 3 that used modeling) – here, only 1 of 33 patients would have qualified for MRI screening. Using all 3 models and opting for the highest calculated risk, and adding BRCA positivity, we still would have identified only 16 of 33 cancers (48.5%) in this loose interpretation of ACS guidelines.

Clearly, there are problems with the current guidelines. And it’s really quite simple – if you are going to use a second line of defense (MRI, in this case), then its use ought to be predicated on the probability that the first line of defense is going to fail. That first line is mammography, and the probability of failure is based entirely on breast density. Using risk factors alone (without density) to select patients for MRI screening does not address that first line of defense in any fashion whatsoever, an incomprehensible deficiency.

In fact, even though risk and density were equally weighted in our 2008 “point system,” one can make the case to jettison risk calculations entirely, and base MRI screening on density levels alone. Witness what is going on in The Netherlands with the DENSE Trial of MRI screening. This is a prospective, randomized trial of mammography every 2 years versus mammography plus MRI every two years in women aged 50 to 75 (over the course of 3 screens), using a single entry criterion – Level D density.

 Risk levels have been tossed out (other than the inherent risk of Level D mammograms) and the entire study is predicated on the idea that this group of patients will harbor a substantial number of mammographically occult cancers (comparable to the yields in the high risk MRI screening trials). If the ACRIN 6666 subgroup that underwent a single MRI is any indicator, the Dutch will generate a landmark study that, by the way, includes mortality reduction as one of the endpoints.

I began this blogatorial with the intent to focus on the paradox of long-term risks versus short-term incidence, but before I knew it, I had slipped into my chronic, ongoing rant about the treatment of mammographic density as some sort of isolated risk factor that “needs more research” in the 2007 MRI screening guidelines.

But there is so much more in the current guidelines to whine about – such as the disconnect between risk of breast cancer and risk of gene-positivity (addressed by Kevin Hughes, MD and his team in Cancer 2008; 113:3116-3120). Then, there’s the odd approach to tissue risks by the ACS wherein suddenly “lifetime risk” with ADH/ALH/LCIS is tossed out the window and the risks are described with short-term values, such as “12-year follow-up.” So, a young woman at 40% lifetime risk after a diagnosis of ADH does not qualify for MRI while the same woman at 21% lifetime risk due to family history will qualify (thankfully, peer reviewers don’t follow the letter of the law, and the ADH patient will usually qualify.) This is a good thing. We are now up to 52 MRI-discovered cancers, and ADH was the dominant risk factor in 12 of the 52.

Okay, I’m clearly rambling now, and the next thing you know, I’ll be discussing the exclusion of patients with prior breast cancer in the “needs more research category” where I guess we’re supposed to be comfortable with 40% sensitivity for mammography.

The revised MRI screening guidelines – 2nd Edition – were targeted for release several years ago, and I’m not sure what happened. When the ACS released their revised mammography screening guidelines for the general population in 2015, it was stated that the high-risk guidelines would be next. I can’t complain that I didn’t get the opportunity to make my case. Although no one invited me to the party that will decide the new guidelines, I did have the opportunity at a committee meeting to chat with one of the key policymakers at the American Cancer Society who will be guiding the new recommendations. As I discussed the age discrimination problem, it was clear that all the ramifications of our current system had not been considered the first time around, especially as pertains to remaining lifetime risk.

So, we await the new guidelines. After what I’ve seen so far, given the precious few who have devoted careers to the nuances of risk assessment, I’ve got to raise this skeptical toast to the policy-makers: “May you not make things worse than they already are.”

 

Advertisements