SNPs Sneak Into Breast Cancer Risk Assessment

courtesy —


They’ve been knocking at the door for nearly 20 years. Insulted as inadequate, misleading and of no clinical utility, the much-maligned SNPs stood outside waiting their turn until – finally – they were allowed to enter the world of breast cancer risk assessment.

What are they? SNPs (pronounced “Snips”) are arbitrarily defined as “single nucleotide polymorphisms” that occur in greater than 1% of the population. So, what would this same variation be if it occurs in less than 1%? Answer: a mutation.

Right off the bat, we have problems arising with ethnicity, in that one group’s SNP can be another group’s mutation. And that’s just for starters. (Note: I’m going to use the old terminology here that includes “polymorphism” and “mutation,” given that the recommendation to label everything in genetics as “variants” has not yet been completely adopted – see October 2017 blog.)

SNPs can be substitutions, deletions or insertions, and might occur in coding or non-coding areas. They might be silent (same amino acid in spite of the nucleotide change); they might yield a different amino acid; or might even generate a stop codon. They can be disease-causing, just like a mutation (ergo, the call to revise our terminology, as many reserve the term “polymorphism” for non-disease-causing alterations).

When studied in breast cancer risk assessment, however, SNPs are complements to traditional risk factors. And, individually, SNPs are weak. That is, the power of any single SNP is small, with RRs and ORs barely above 1.0 in most cases. So, the big question is: How do you combine SNPs in order to yield clinically important information? And the bigger question: How do those combinations of SNPs interact with established risk factors?

Although their presence has been known for many years under a variety of descriptors, the SNP acronym emerged in the mid-1990s, and then the concept of SNPs having a potential use in breast cancer risk assessment was on a roll by 2000. At the same time, home DNA kits were being developed using SNPs to determine if the user had an aptitude for playing the accordion or perhaps becoming an Olympic diver, prompting SNP-promoters to be labeled as “the used car salesmen of science.”

Early on, it was clear that no single SNP would do the trick for refining risk assessment. Groups of SNPs became the goal. But how do you define the proper groupings when you’re dealing with this tidbit: By 2001, a map of the human genome sequence variation was published in Nature (409:928-933) containing 1.42 million SNPs. As for those SNPs that pertain to breast cancer risk, in the Oct. 23, 2017 issue of Nature, a massive project added 65 new SNPs to those known to impact risk levels, bringing the total to approximately 180 breast-related SNPs.

In Oklahoma City, a biotech company was established to improve breast cancer risk assessment by using key SNPs that were clustered into triplets, drawn from those SNPs that might be involved in hormonal or carcinogenic pathways. In keeping with the buzz phrase, “personalized medicine,” this company actually did a nice job from the laboratory science standpoint, generating RRs for a very large number of specific triplet combos. While some SNP combos exceeded RR=3.0 (and a few hit RR=5.0), the number of volunteers with those specific combos were scarce, leaving very wide confidence intervals even when statistically significant. For the vast majority of patients, the degree of risk imparted by the SNP triplet was subclinical (that is, a RR less than 2.0).

But then comes a bigger problem – how do you blend SNPs into the standard mathematical models, when those models are already on shaky ground? In our current models, traditional risks are merged mathematically, with little regard for biologic interactions. Ideally, risks should be studied in couplets and triplets to better understand these interactions. Even then, however, studies can be conflicting.

Consider the interaction of atypical hyperplasia and family history. The Mayo Clinic data indicates that atypical hyperplasia is the primary driver for risk calculations, largely unaffected by family history. Yet, the original Page and DuPont data showed an impressive degree of synergism (9-fold risk) with atypical hyperplasia and a first-degree relative with breast cancer, a feature that made it into both the Gail and Tyrer-Cuzick models. Now, into this confusion, including a broad range of calculated risks depending on the model used, we add SNPs?

Well, the company in Oklahoma City added their SNP results to the Gail model exclusively. This created the unfortunate situation of bad conclusions if the Gail model was inappropriate in the first place, a not uncommon situation, yet rarely recognized by those unfamiliar with the construct and contraindications for the Gail. Patients with LCIS (not included in Gail) were told they were at normal risk because of their “gene test” results, but it was the Gail model that misinformed, when in fact, the SNP results had been neutral (RR=1.0). The same was true for extensive family histories positive for breast cancer, but only in second degree relatives. Results were not separated into the Gail component and the SNP component. Instead, for the patient and her doctor, there was a single “score” to reflect risk. And if the Gail was wrong, the score was wrong, and the counseling was wrong.

Although the company is no longer in business, we are still working through its aftermath in that many women in Oklahoma (and in certain locations nationwide) still believe they have completed “genetic testing” as a result of their SNP results years ago. Imagine their surprise to learn that they have had no analysis whatsoever of the cancer predisposition genes.

 We tend to forget that these mathematical models are at their best in predicting the number of cancers in a large cohort for clinical trials. Here, the Gail works nicely. At the individual level, though, the original Gail had a c-stat of .58 (or barely better than flipping a coin). To attach SNPs to the Gail is akin to putting lipstick on pigs if you are counseling individual patients.

Enter Tyrer-Cuzick. Based on western European whites, this model has a sharp restriction when it comes to using it to calculate risks in other ethnicities. And, it tends to overestimate probabilities when tissue risks are paired with family history. Its attractive features, of course, are the inclusion of 2nd and 3rd degree relatives, paternal history, reproductive risks, tissue risks and prior genetic testing results.

Version 6.0 of the Tyrer-Cuzick (T-C) model calculated lifetime risk through age 80, so when we moved to version 7.0, the calculations were slightly higher, in that 5 additional years were added through age 85.

And, now, with version 8.0, breast density has been added, which can work both ways, generating numbers higher or lower than v. 7.0. This introduces an element of gamesmanship, of course, as many of us are trying to reach the magic 20% lifetime risk threshold to justify screening with breast MRI (another subject entirely). Be that as it may, you’re going to get higher numbers by sticking with v. 7.0 if your patient has A or B density, but then using 8.0 if density levels are C or D. (I’m embarrassed to admit the hoops we jump through to overcome moth-eaten guidelines.)

If you ever toyed with the beta version of Tyrer-Cuzick 8.0, you might have discovered under TOOLS that you could apply SNP risk. That’s right. If you had access to SNP data, you could plug it into the T-C model on your own. The problem was obvious, at least in the U.S: “Where do you get this free-standing SNP data?” The commercial entities would only provide their “final score,” which was a Gail model foundation with SNPs thrown in. How could you arrive at an independent SNP risk? And that’s the reason I’m writing. Myriad Genetics has started to do this for you – automatically, at no cost, with caveats to follow – by combining SNP results with the Tyrer-Cuzick model, version 7.0., resulting in the riskScore®.

Here are the key points about this semi-major development:

First of all, when I heard this was going to happen, I made the mistake of thinking it would apply to all women who had tested negative for the predisposition genes. This would have been the same errant step that prior companies had done with SNPs. Instead, I was relieved to learn that, at first, the application of SNPs will only occur in a select group, and using Tyrer-Cuzick rather than the Gail.

Because the T-C model is based on women of European ancestry, Myriad will not report SNPs in the form of riskScore® for other ethnicities (until data emerges). Further inclusion/exclusion criteria: patients must be under age 85 (the T-C model does not include calculations for immortality), no personal history of breast cancer, no LCIS, no Atypical Hyperplasia or proliferative change on a prior biopsynot even a prior biopsy with unknown results. And, the riskScore® will not be calculated if a blood relative is known to carry a mutation in a breast cancer risk gene. Furthermore, breast density will not be included as an independent variable, as evidenced by the choice to go with T-C version 7.0 rather than 8.0.

So, is there anyone left? Yes, quite a few actually. And I have to hand it to Myriad for being so careful with riskScore® whereas prior companies have bulldozed their way into risk assessment with their SNPs. By offering the service at no charge (at least for now), Myriad will report the data only when the inclusion/exclusion criteria are met. But even when they don’t report SNP impact, they will be collecting data so that eventually the exclusions will peel away from the process.

Why so many exclusions? Well, what if a particular SNP is responsible for proliferative change on a biopsy? To count the SNP risk and the biopsy risk would be counting the same risk twice, artificially raising risk above what it should be (as if T-C doesn’t already trend toward higher numbers). Or, what if certain SNPs work together to cause breast density? Again, we would be combining different manifestations of the same risk factor.

In practice, when one receives a riskScore® that includes 80-plus markers (mostly SNPs), it is not readily apparent what the risk would have been using the Tyrer-Cuzick alone. However, it’s there on the report, so you can appreciate the impact of the SNPs by comparing the T-C alone to the riskScore® that includes the SNPs. The impact of the SNPs is usually going to be modest, especially when one considers the difference spread out over many remaining years. However, in this world of “20% lifetime risk” for determining insurance coverage of MRI screening, a few percentage points can make the difference.

More importantly, from the scientific standpoint, it will allow Myriad to accumulate the SNP data and how it interrelates with the known risk factors.

With the recent discovery of 65 new breast cancer-related SNPs (reported shortly after Myriad’s announcement), it begs the question: Where are we headed? Are we even going in the right direction? Is the data going to accumulate so fast that our risk calculations are obsolete within months after we document calculated risk?

And that brings me to the place where we older doctors congregate – back porch philosophy.

My risk assessment program predates the commonly used mathematical models. Starting one of the first such programs in the country, I had to rely on epidemiologic models that had not yet broken through to clinicians. For example, the Ottman tables were published in Lancet in 1983 by Ruth Ottman, Malcolm Pike, Mary-Claire King and Brian Henderson. With far more sophistication than the Gail model (albeit focused only on family history), taking into account “age at diagnosis” for relatives, bilaterality in affected relatives, and specific age intervals of the unaffected proband, one could counsel patients as to their absolute risk for breast cancer over a defined period of time.

And…we had the DuPont tables published in 1989 (Stat Med 1989; 8:641-651) that converted relative risks to absolute risks over a defined period of time, preferably using the 20-year table rather than “lifetime.” You could estimate an overall RR from available literature at the time, then select the patient’s age on the graph and follow the curves to the absolute risk for breast cancer.

Then came the Gail model where there was a subtle shift away from science and toward a mathematical pragmatism that few considered as awkward at the time. Risk factors were merged through math, not biology. Yes, there were articles that studied risk factors in couplets, and a few in triplets. But, of course, it was impossible to perform a defined cohort study on every possible combination of risks. Instead, we took a leap of faith toward mathematical models that were very good for predicting the number of cancer cases in a clinical trial (the overestimates largely neutralized the underestimates) but admittedly, at the individual level, the models were marginal, at best, when it came to predicting future breast cancer.

Originally, breast cancer risk assessment began as a fledgling science. Efforts were made to understand “Why?” Why did nulliparity increase risk? Why did a late first full-term pregnancy actually impart slightly greater risk than nulliparity? And on and on. If we could figure out the “why,” then we’d be that much closer to prevention strategies. We had theories to explain these things: estrogen “window” theory, the ovulatory theory, the cellular differentiation theory, circulating hormone theories, etc.

But theories – no more. And this is where we depart from science and move toward technology. And when the dust settles, breast cancer risk assessment might end up as pure technology – that is, we know it works, even though we don’t know how or why.

When the announcement was made about the 300-site study of SNPs (they never used the term SNPs in the article) wherein 65 new ones were proposed, I wrote a brief comment on the web site: that we were going to generate so much data that we were going to exceed our ability to process that data. Dr. Barry Rosen followed my comment with his own, implicating the need for IBM’s Watson to intervene. Dr. Rosen is absolutely correct. As more and more SNP data emerge, the blending with known risk factors will progress and prediction should improve. But just so we remember – if we don’t understand the “why,” then we will be at the mercy of technology rather than revelatory science. As our new master, I hope it treats us well.








Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s