Chris Anderson of Wired magazine has published an interesting and provocative article about how “more is different”. It’s difficult to even visualise huge amounts of data, let alone analyze enormous data sets, but emerging technologies are giving us the tools to be able to interact with bigger and bigger datasets. A petabyte is 2 to the power of 50 (ie 1,125,899,906,842,624). This can be approximated to 10 to the power of 15 (1,000,000,000,000,000). Whilst this is truly a mind-bogglingly large number, Google servers process this much information every 72 minutes! But wait, it gets even more amazing! There are bigger numbers. An “exabyte” for example is 1,024 petabytes, and a “zetabyte” is 1,024 exabytes. Let’s not even go there yet! We can process such vast amounts of information by using large networks of computers and algorithms which handle the datasets as “clouds”. I like the “cloud” idea. You might already be familiar with it through the tool known as “tag clouds“. However, let’s get back to Chris Anderson’s article.
Anderson says that science has proceeded until now by making models then testing to see how well the models fit the data -“hypothesize, model, test”. This enables scientists to uncover the links between events which show us how those events come about (causation) and then make predictions about the future. This is a powerful method and has greatly increased human understanding. However,
There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
In other words, the ability to handle such vast amounts of information directly, allows us to uncover the correlations which exist and thereby to see patterns emerge right out of the data without pre-selecting the data with a hypothesis and a model.
Anderson has pushed this idea provocatively to claim this means the end of science as we know it and a lot of commentators have reacted to this with strong disagreement. The points made both by Anderson in his original article and by the commentators are stimulating and thought provoking.
George Dyson says
The massively-distributed collective associative memory that constitutes the “Overmind” (or Kevin’s OneComputer) is already forming associations, recognizing patterns, and making predictions—though this does not mean thinking the way we do, or on any scale that we can comprehend. The sudden flood of large data sets and the opening of entirely new scientific territory promises a return to the excitement at the birth of (modern) Science in the 17th century, when, as Newton, Boyle, Hooke, Petty, and the rest of them saw it, it was “the Business of Natural Philosophy” to find things out. What Chris Anderson is hinting at is that Science will increasingly belong to a new generation of Natural Philosophers who are not only reading Nature directly, but are beginning to read the Overmind.
This feels right to me. These new methods are not the death of science but are the beginning of scientific methods which will change the way we understand the world. Kevin Kelly says more along this line of thought
My guess is that this emerging method will be one additional tool in the evolution of the scientific method. It will not replace any current methods (sorry, no end of science!) but will compliment established theory-driven science. Let’s call this data intensive approach to problem solving Correlative Analytics. I think Chris squander a unique opportunity by titling his thesis “The End of Theory” because this is a negation, the absence of something. Rather it is the beginning of something, and this is when you have a chance to accelerate that birth by giving it a positive name. A non-negative name will also help clarify the thesis. I am suggesting Correlative Analytics rather than No Theory because I am not entirely sure that these correlative systems are model-free. I think there is an emergent, unconscious, implicit model embedded in the system that generates answers.
Maybe the contribution I’ve enjoyed most, however, is that made by Bruce Sterling, which begins this way –
I’m as impressed by the prefixes “peta” and “exa” as the next guy. I’m also inclined to think that search engines are a bigger, better deal that Artificial Intelligence (even if Artificial Intelligence had ever managed to exist outside science fiction). I also love the idea of large, cloudy, yet deep relationships between seemingly unrelated phenomena—in literature, we call those gizmos “metaphors. ” They’re great!
As is so often the case, Bruce Sterling puts his finger right on what’s interesting. He highlights the relationship between this way of viewing data sets and the way we use language. Metaphors are incredibly powerful tools. They can feel like a kind of magic, producing sudden, potentially profound insights, literally in moments. It’s exciting to think that the “petabyte age” will bring us similar tools to engage with a wide range of phenomena.
Finally, Oliver Norton brilliantly manages to make these mind-bogglingly large computations suddenly seem not so overwhelming at all by saying –
And I guess my other point is “petabytes—phwaah”. Sure, a petabyte is a big thing—but the number of ways one can ask questions far bigger. I’m no mathematician, and will happily take correction on this, but as I see it one way of understanding a kilobit is as a resource that can be exhausted—or maybe a space that can be collapsed—with 10 yes or no questions: that’s what 2 [10] is. For a kilobyte raise the number to 13. For a petabyte raise it to 53. Now in many cases 53 is a lot of questions. But in networks of thousands of genes, really not so much.
The complexities of life can seem overwhelming but I feel pretty excited by our human capacity to perceive patterns using all kinds of tools from “clouds” to “metaphors”. The drive to make sense of life, to find meaning and purpose, is a core human quality. Science, its new methods and its old ones, is one way of responding to this drive.
Read Full Post »