In the March 15, 2018, issue of The New England Journal of Medicine, an editorial from the Stanford School of Medicine (Danton Char, MD et al) offers a cautionary note about ethical concerns that will accompany artificial intelligence in medicine. Key points include:
–Data used to create algorithms can contain bias, with skewing of results depending on the motives of the programmers and whoever pays them.
–Physicians must understand how algorithms are created and how these models function in order to guard against becoming overly dependent upon them.
–Big Data becomes part of “collective knowledge” and might be used without regard for clinical experience and the human aspect of patient care.
–Machine-learning-based clinical guidelines might introduce a third-party “actor” into the doctor-patient relationship, challenging the dynamics of responsibility in the relationship.
As a bona fide Luddite, I’m relieved to see others are concerned about the bold promises of artificial intelligence (AI) in medicine. You are probably familiar with the Luddites, a group of English textile workers who, from 1811 to 1816, organized a rebellion whereby the newfangled weaving machines were destroyed as a form of protest. Eventually, mill owners began shooting protestors, and the rebellion was squashed with military might.
Over the years, the term Luddite has been applied to any anti-progress position. But with the advent of the computer age, the moniker has enjoyed resurgence and is commonly used in a pejorative sense to reference those of us who are skeptical of Big Data, data mining, machine learning, AI and anything else that is potentially dehumanizing and threatens to manipulate and change the world beyond all recognition. Well, maybe I’ve slipped into hyperbole here, but you get the picture.
So, you might be surprised to learn that this Luddite-author’s name is on a recent paper with this Luddite-unfriendly title: “Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm,” published in Phys Med Biol, Jan. 30, 2018 (doi: 10. 1088/1361-6560/aaa1ca.) with co-authors M. Heidari, AZ Khuzani, G. Danala, S. Mirniaharikandehei, Y Qiu, H Liu, and B Zheng. Furthermore, I admit without shame that I am in the midst of reviewing 4,000 mammograms where the images have been converted into “risk score” numbers at the pleasure of algorithms, all through an R-01 NCI grant that will last through June 2020.
The term “locality preserving projection algorithm” in our title above prompted me to visit Wikipedia to see if something had changed in the world of algorithms. I recommend you do the same. This is not your grandmother’s algorithm. The complexity is daunting, and I don’t think many of us clinicians are going to have a clue as to how these programmed algorithms work, in contrast to the admonition from the Stanford School of Medicine. (My contribution to the article, by the way, was strictly clinical, accepting the “numberized” mammograms at face value.)
And what exactly is machine learning? In simple terms (my only refuge) machine learning involves the construction of algorithms that can “learn” without being explicitly programmed and can thus make predictions based on incoming data.
My introduction to the term “machine learning” was entirely negative, as any good Luddite is proud to point out. It came a few years back when I was wearing my other hat – that of a novelist. In 2001, I had the good fortune of a successful novel, which you can read about on this web site or elsewhere. Amazon Customer Reviews were a new concept at the time, and it was quite unsettling that anyone could post anything about my work while I had no recourse whatsoever. Harsh criticism is difficult to swallow for the novelist who has spent many years on one manuscript, given that all responsibility is singular, that is, there is only one set of shoulders. If you rate a movie as bad, a myriad of individuals take responsibility, but a book – all the poundage settles on the author’s back.
But I was lucky in that regard. I had nearly all 5-star reviews, and a 4.5 Star average. Yes, one Angry Bird gave me a puny 1-star, adding the word “horrifying,” as my novel was apparently the worst thing he or she had ever read. Such an outlier, though, doesn’t carry much weight (or shouldn’t), and it barely affected the 4.5 average. The book was a bestseller, it stayed in film option for 15 years with 3 different groups, and I made 3 trips to Beverly Hills with each new option. No movie (yet), but it was all fun and successful well beyond my expectations.
After the hoopla died down, I re-visited the Amazon web site many years later, gearing up for the release of my new book, Killing Albert Berch, a true crime/memoir. Newly released books prompt backlist purchases, and I thought there might be renewed interest in my novels, Flatbellies and University Boulevard. I wanted to make sure copies were still available through online retailers. Barnes & Noble Online had both my novels with 100% 5-star reviews, and so far, Killing Albert Berch has had 100% 5-star reviews.
But when I went to the Amazon page for Flatbellies (by far, my most popular work), the ranking had dropped to 3.7 stars, the lowest of any of my writings. I assumed some mediocre reviews levied during the intervening years had prompted the drop, but there were no recent reviews. They were the same reviews I had seen years ago. The highly offended One-Star reviewer was still there, along with a single 3-star and a single 4- star. Yet, 11 other reviewers had given the book 5 stars. One doesn’t have to do the math in order to do the math. The average should be well above 3.7.
Then I read how the Amazon rating system had changed to machine learning from the old-fashioned mathematical law of averages that has worked so well for several thousand years. And this is how Amazon’s machine learning works, based on 3 parameters (with my comments in parentheses).
–The Age of a Review (Do opinions become wine or vinegar with age?)
–The “Helpfulness” votes by customers, created when you click on the button that asks, “Was this review helpful?” (Okay-y-y-y).
–Whether or not a review is accompanied by a “verified purchase” (In other words, did the reviewer buy their book on Amazon? You can fill in the blanks here.)
Mystery remained, however. The 1-Star Angry Bird could not have dragged my rating down as a single reviewer, as there was only one person who found the review helpful and this purchase was not verified. As it turns out, the 3-star review carried more dragging weight, as this was a verified purchase and a powerful component of the machine-learned algorithm. Still, it’s bizarre that the novel has a 3.7 star ranking, with only 2 reviews below that level, and 12 reviews well above that level. A mathematical average would be 4.5, the median would be 5.0, the mode would be 5.0, and the Olympic approach – drop the highest and lowest, then average – would have yielded 4.75.
But the Amazon machine knows better – it renders a lackluster 3.7. What does it matter? Not much when it comes to this particular situation, a novel so many years removed from its heyday. But these masterminds and their ilk are gearing up to dictate the future of medicine.
What will you do when AI tells your patient that she has no business going to the doctor when her “score” didn’t qualify her for the trip? Or, that her cancer surgery won’t be allowed because AI has determined that risks exceed benefits, in that her tumor biology scored a number too low to worry about? Or, that your patient should not be screened at your breast center because it has been determined that your facility deserves only a 1-Star ranking, and is, indeed, a “horrifying” location for mammography? And consider, too, that it will be the third-party payors, including our government, which will be most eager to program the algorithms.
We’re already getting obsessed with number-based guidelines, paving the way for the AI patrols to enter the scene with their truthiness based on positioning the data somewhere along the shadowy “training-validation-test” continuum. The next step will be relinquishing our brains at the altar of AI, simply because the stupefying complexity must warrant some form of worship.
But in contrast to all of the posturing above, here’s a Luddite dream-quote from Bloomberg.com 2016-11-10, in an article addressing machine learning: “Effective machine learning is difficult because finding patterns is hard and often not enough training data is available; as a result, machine-learning programs often fail to deliver.”
You undoubtedly have heard of the term “butterfly effect,” and thank goodness someone gave it that name because the ten-dollar word that preceded it was “concatenation.” Hollywood scriptwriters have had a field day with this concept, applying it to every plot ever conceived about time travel (although mathematical philosophers have dallied with the concept at least since 1800). But the story behind the name “butterfly effect” is a bit concerning when it comes to algorithm-driven AI and machine learning.
Edward Lorenz was a meteorologist and mathematician who was using computer models early on to predict the weather. In 1961, he decided to re-calculate outcomes from a prior study. Given the slow speed of computers at the time, he decided to start in the middle of the project as a shortcut, rather than go back to the beginning. He typed in the initial condition as 0.506 from the earlier printout, rather than the complete and precise value of 0.506127. (Honestly, how much difference could 0.000127 possibly make?) He went down the hall for a cup of coffee and returned an hour later, during which time the computer had simulated two months of weather. The conditions were completely different than the first time around. He thought of every possible malfunction before finally realizing that the tiny alteration in data entry had magnified exponentially over time, until the difference “dominated the solution.”
In 1963, Lorenz published his seminal work as “Deterministic Nonperiodic Flow,” and in that first iteration, he used the metaphor of the “flap of a sea gull’s wing” to note the powerful impact based on chaos theory. In 1972, he failed to provide a title for a talk that he was giving at the American Association for Advancement of Science, and a colleague suggested “Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?” And a new phrase – butterfly effect – was added to our lexicon.
Note the distinction between a minor tweak in a chaotic universe versus direct causation. The butterfly’s wings do not cause the tornado, but the end result is dramatically altered with the tiniest of data input. So now, my question is this – if such tiny alterations early in algorithmic progression can cause such major differences in outcomes, do we really want to put the future of medicine on the wings of butterflies?
Oh, one other thing about the Luddites. As is frequently the case, convenient and capsulized summaries have distorted the original truth. It turns out that Luddites were not anti-machinery at all. Many were highly skilled machine operators in the textile industry, and furthermore, the technology was not new. In truth, Luddites were opposed to the way in which management was using the machines in what was called “a fraudulent and deceitful manner.” And, they were opposed to poor working conditions in general. In short, they wanted machines that made high quality goods, and they wanted the operators of those machines to be well-trained and well-paid. Destruction of machines during the Industrial Revolution was a common method of general protest, established well before the Luddite rebellion and should in no way imply that the Luddites were anti-machinery or anti-progress.
As is often the case, it’s not the technology that’s the problem – it’s the users.