Thursday, June 26, 2008

Dangerous New Chris Anderson Theory: We No Longer Need Logic

While I do not know Chris Anderson, editor-in-chief of Wired Magazine, I would imagine he is a decent fellow. And by any measure, he is a smart fellow. But the purpose of this piece is to attempt to stop an idea that he is promoting that I consider to be neither decent, nor smart.

First, as many of you know, I have, in the past, taken Chris to task for what I called the voodoo economics of his theories around the idea that everything digital will be free, and how wonderful that will be. And yet, regardless of what I may say, there are many defenders of his theories around free.

But In the pages of this month's Wired Magazine, Chris proposes a new framework for thinking about science and logic that is so utterly irrational that I cannot imagine any serious thinkers flocking to his side. And yet, despite the ridiculousness of Chris's arguments, they must be addressed, because they may cause serious social harm.

Rather than summarize Chris's thesis, let me just quote him. First, Chris's accurate summary of a core tenet of scientific method:

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

And then Chris puts forward what I will call the Anderson Theory. It is based on the idea that with massive stores of data, most notably Google, we do not need such scientific methods any more. With such huge amounts of data we can establish much more detailed correlations, therefore making formal logic and scientific method irrelevant. Chris says:

There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

So there you have it. Correlation is enough. Causation is irrelevant.

My argument against The Anderson Theory is centered in formal logic. For those of you that have never thought about the difference between correlation and causation, think of it this way. If we discover that in certain situations A and B are always present, that may very well be interesting information. But it is critical to know the direction of the relationship or causation. Does A cause B, does B cause A, or does C cause A and B. Without knowing the answer to this question we cannot derive anything useful or actionable about the fact that both A and B are often present at the same time. For Chris Anderson to suggest otherwise is, in my view, exceedingly dangerous.

As I see it, the most significant danger here is not in the hard sciences, where wacky thoughts like this will be dismissed out of hand, but in social science, or perhaps what would best be referred to as "pop" social science.

In this context people are always trying to use statistics via correlation to prove socially harmful theories and to justify harmful strategies including, for example, the horrific concept of eugenics. One of the more famous examples of such correlation based logic is the widely debunked 1994 book The Bell Curve, where authors Richard Herrnstein and Charles Murray theorized that Blacks have lower I.Q.s than whites, and that the discrepancy is likely genetically based.

What would flow from this kind of thinking of course is a framework that potentially justified all kinds of race based social service and educational inequities based on the idea that such money could be better spent. Why give everyone the same education if some are ultimately less capable than others? In fact The Bell Curve was nothing more than faux science being used to prop up a political agenda. I fear people with outlying agendas gaining credibility through a reduction in rationality and rigor.

The point is that wacky memes often take off on the Internet regardless of their validity, provability, or reasonableness. People often reason in a manner to prove their underlying world view or political framework. If a seemingly credible person like Chris Anderson proclaims it OK to loosen the rigor with which we analyze the world around us, it will give comfort to those with less noble intentions.

Armed with Google, the lesser, or less noble among us will be "proving" all kinds of baseless social concepts, blessed with a patina of credibility from The Anderson Theory. But we cannot allow this to be. Requiring the establishment of not just correlation but causation in our arguments, is one of the primary backstops against evil intent in public discourse.

16 comments:

Anonymous said...

bravo

glenn mcdonald said...

I think you're fundamantally right, but that you're also probably falling for Chris's rhetorical-exaggeration-baited trap. Logic isn't obsolete, but there have certainly been lots of places in the history of science, particularly applied science, where "why" was a means to finding out "what". And Chris is right that with more data and faster statistics we can sometimes find out "what" directly. That's sometimes cool and sometimes useful. Google's approach to translation is a good example of putting this idea to a startlingly effective practical use.

But he opts to leave a couple truly idiotic ideas unqualified in his article, presumably to provoke reactions (like yours). One is that statistics and "the scientific method" are somehow inherently disjunct, which of course isn't even vaguely true. The other is that we're only ever concerned with "what", and thus no longer need to care about causality. Tell that to a sarcoma patient after you Google "sarcoma cure" and it doesn't cure them.

But give his article to a decent editor for five minutes and it would be fine. "Correlation supersedes causation" is dopey, and should be rewritten to something like "Sometimes correlation is findable and usable where causality remains elusive." On the other hand, this line is pretty good: "But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world." That's basically true, and exciting, and powerful. And "What can science learn from Google?" is a fair question, however cutely phrased.

On the other hand, Chris is an editor, not a scientist, so while he isn't necessarily responsible for keeping track of how "the scientific method" has evolved since Newton, he ought to be able to craft a piece that does a better job of making its real point without exaggerating it for shock value.

[And, incidentally, The Bell Curve has been demonized much more than it deserved. M&H were actually extremely careful to precisely qualify their statistical work. Almost none of the shock-value exaggeration in the discussion of that book actually originates in its text, and most stems from not taking the time to read it. Which is understandable, mind you, because it's very long and very calm and very dry, especially if you come to it anticipating delightful outrage...]

John said...

I don't see why Anderson's idea is even buzz worthy, since it's not a new idea. It's basically a form of behavioralism.

Predictive modeling has been around for decades. Behavioral marketers (especially direct marketers) have built regression models to optimize their campaigns for years. These are the kinds of models that tell you that a person who has oil heating in their home are more likely to sign up for a new credit card offer, without explaining why this is true. It's also used in many different fields.

Perhaps where human nature is involved statistical guesses are the best we can come up with, perhaps not. As long as they work, they will be useful to some people. But that doesn't mean they should exhaust the full gamut of tools available to researchers and scientists.

Just another hypester trying to make a buck selling books.

Brian Yamabe said...

Your article leads me to the problem I have with the Global Warming hypothesis. Most people's understanding is based on correlation. The problem is that nobody understands the underlying model. I know of no computer model that describes the temperature for the past 100 years so why should I believe that the same model can do so going forward?

gregory said...

i think chris anderson is stuck with malcolm gladwell syndrome, and it won't be curable in this lifetime

related his thoughts to nicholas carr's of last week, the google makes us dumb meme, add a bit of triangulation, the idea that data, when manipulated leads to knowledge gains a huge amount of weight.

where are values in all of this? lost, buried, subsumed, unaddressed. but they were in traditional rigorous science as well. when i studied it i often felt that science was nothing but applied religion, so inseparable from the christian world view as to render them as twins separated at birth

science fails not by its fabulous attempt attempts at method and objectivity, but by the questions it wont ask, cannot ask, refuses to ask ... and in the superficial understanding of reality that comes from an addiction to its extremely limited means for gaining knowledge.

examples? any scientific investigation into love, values, intuition, consciousness ...

we need more now, in this era, to deal with the human adventure

and brian yambe, no one really knows what climate is, we only have a flawed model and limted understanding ... global warming is no problem at all for the earth, it is one big adjustment machine

and why the f do i have to twice enter so many blogger coments

Anonymous said...

I tried about five times to post. Those captchas are absoulutely craptastic Hank. Have you even tried using your own blog comment system. It sucks big time.

Anonymous said...

I wanted to say on topic though that this was a great and important post before I was totally sidetracked by such an onery and frustrating comment system.

Dan Dorman said...

The idea Chris is talking about isn't exactly without merit. By way of example, a lot of the work in natural language processing has moved toward a stochastic approach rather than an algorithmic one. Anand Rajaraman wrote a couple of good posts around the premise that "more data usually beats better algorithms." This seems to support the idea that correlation, i.e. a sufficiently large data set, can--at least in some cases--outperform a fully developed causal model.

I don't disagree with you, Hank, in fact I really appreciate the historical context you provided for your point of view; I just think you're overreacting to the article's sensationalism. Dude's got to sell magazines, after all!

Anonymous said...

I think you should open up your mind a little, instead of feeling threatened by your world loosing a logical explanation.
What Andreson says is just another way of viewing things. Is not right nor wrong.
You're entitled to dislike it, but not at all to rant about it.

Che said...

@Anonymous

Glad someone knows what Hank is entitled to do.

Am I entitled to say this?

Anonymous said...

"You're entitled to dislike it, but not at all to rant about it."

Speaking of logic...

Dan Lewis said...

A similar argument to Anderson's applies to, as an example, robot localization and navigation. Ben Kuipers wrote a bunch of papers about ontological approaches to navigation. This is a good introduction.
http://www.cs.utexas.edu/users/qr/papers/Kuipers-NZ-08.html

His idea was that you can see localization at different scales. For instance, you have immediate sensors that a robot can use to localize in a small neighborhood, avoid obstacles and such. Higher up, the robot can use idealized models of the topology, with nodes at choice points for instance, and take a high-level approach to navigation. This is nice because you can run a much simpler program and the effects of sensor errors can be accounted for more gracefully.

However, in a world where the robot's sensors are powerful enough, or the environment is instrumented enough with RFID tags, the ontological stuff becomes superfluous. All navigation becomes local navigation. This would have been impossible scant years ago. Now it's almost inevitable.

To me that's what's interesting about Anderson's argument. It is an invitation to consider where all this classical, symbolic stuff becomes not incorrect, but superfluous. It's not that models aren't helpful. They're just not helpful enough.

Anonymous said...

@Brian Yamabe
"I know of no computer model that describes the temperature for the past 100 years so why should I believe that the same model can do so going forward?"

1) Would you actually believe a model if it existed?
2) Have you looked at all for such a model?

Check this out:
http://www.met.utah.edu/news/u-study-substantiates-computer-models-for-global-warming

Scientists have been developing computer models for years and of course no model is not worth it if it can't account for past data.

Socrates7 said...

I really don’t get this entire line of argument. What’s the big deal again?

First, Anderson is suggesting that since data sets are extraordinarily large, they are/may be impossible to model. Second, that this dumps the Scientific Method. Third, that ‘knowledge’ or ‘science’ can continue without either models or theories, and that statistical analysis might be sufficient in their place. I think that’s a fair summary, no? Your stated objection (I think there was really only one that was on point here) is an objection about the primacy of statistical correlation.

I'm sorry to be negative, but I think all of these points are rather trivial.

Starting with Anderson: data set size has nothing to do with modeling. I think there are two horns to this argument, and the first horn Anderson is targeting is ‘modeling’ specifically, but he relies on a weak understanding of some basic truths about Scientific Modeling, and how poorly models match up to the things being modeled. But in fact, that’s what a model is designed to do, to match, or more properly, “map” poorly. It *is* there to capture some refined truth, and at the same time, illuminate or highlight that truth by eliminating unnecessary or superfluous data. And it’s worth noting that the counter point that complexity entails a rejection of the definition of “superfluous” — everything is relevant — is valid as far as I can tell. That’s interesting. But not novel. This “truth” has been known for a very long time. Scientific Laws aren’t accurate depictions of reality — they’re simplifications. Abstractions. And as such, they’re false. All abstractions are missing part of the story, and doing so deliberately. This is pretty much the SOP for Science with a capital ‘S’. What the second horn of the argument fails on is that models and people somehow map. That is, models must be understandable. Which is, of course, not true. Most scientists would argue that models can range in complexity. The more elegant the model the more basic the truth, is perhaps how they’d argue. But not all systems are amenable to such analysis. “Consciousness” is one of those things. It’s hard. There’s lots of moving parts. And those parts may not be reduce-able to chunks that make modeling simple or even comprehensible. Does that make it less of a model?

To Anderson’s second point, that models are required for the Scientific Method, I have not much to add. If you hold that theories specify a class of models and the goal of the enterprise is to determine the truth conditions for the models, fine. Kick out the models and the whole structure collapses. Most would argue that this is a silly interpretation of science. But if this is your straw man, so be it. I would like to propose, however, that complex models might well be beyond our (as humans) ability to grasp, and in a rather trivial dimension — like, say the sheer number of variables specified by the model. But again, this doesn’t mean that the model isn’t there, or that the scientific engine is flawed. Just that we are.

To Anderson’s third point, and your point as well, correlation is not causation. Of course, this is a truism. But law and theory do not require a set of necessary and sufficient causes to claim “advance”. While this may not be intuitive, Karl Popper has pretty much ruined the hopes of scientists to uncover The Truth. “Pursuit” of the truth (as opposed to "attainment") is much more worthwhile if only because it’s the one that's feasible.

As Anderson acknowledges, Relativity and Newtonian mechanics are not True. They’re “true enough”. And in many cases, statistical correlation may be as close as we’re going to get. I think we need to get over the epistemic problem ("Truth" is unknowable) here and move on -- Science and it's practitioners have.

What I’d like to say is that Science, while not about Truth, per se, it is about uncovering interesting regularities, predicting behavior, and such. Data mining is a very interesting avenue for scientific research, and as Anderson points out, Craig Ventner has used it to interesting impact. And so long as the scientific enterprise is willing to be driven in whimsical directions as data driven revelations are revealed, then great. Have at it. But never assume for a moment that that’s all that is happening. As soon as the quest becomes in any way directed, as soon as you say, “huh, that’s interesting”, you’re building theories, developing and testing models — you’re doing Science.

Just like always. Whoopee.

Socrates7 said...

Oh, and for the kind contributor who said the following:

"I think you should open up your mind a little, instead of feeling threatened by your world loosing a logical explanation.

What Andreson says is just another way of viewing things. Is not right nor wrong.

You're entitled to dislike it, but not at all to rant about it."

Rubbish. This is patently false on it's face. You are not entitled to your opinions. You simply have them. Even when their nonsensical, irrational, or contradictory. This does not in any way imply "entitlement". As people, you're welcome to be lazy. In which case, please do so and go find something else to stare blankly at. As a scientist, you are obligated to struggle with propositions. And when logic dictates a result, you are compelled to adopt it or reject it. It's not a matter of preference, whim or fancy.

And injecting an ad hominem attack in a dialectic engagement is foul play. Arguing isn't an acknowlegement of threat or fear about surrendering a world view. Please.

Not to pile on, but any claim about reality, or comment about such a claim, is in fact a matter that is capable of being judged "right" or "wrong", depending upon agreed upon criteria -- in this case, the Scientific Method.

Madd Scientist said...

Chris Andserson is a barnacle of mediocrity on Bill Gates' asshole.

Post a Comment