Intraclient Skip to main content ShopIntraclient Support Downloads
Home Products Consulting App Select App  Register About Intraclient

Manufacturing, Wholesale, and Distribution

Buy Process

  Free Trial

Waterford Education Platform

Intraclient Apps

Softnet LADb

Skyphone/Drone Apps

Milkport Report
Murmurate Search App

Accenta ERP

Service Industries

Related Links

News Connections




How Chaos Theory Is Like insanity in big data.

What could go wrong? Chaos theory. You may have heard the expression: the flap of a butterfly's wings in Brazil can set off a tornado in Texas. It comes from the title of a paper delivered in 1972 by MIT's Edward Lorenz,. Chaos theory applies to systems in which each of two properties hold:

The systems are dynamic, meaning that the behavior of the system at one point in time influences its behavior in the future;

And they are nonlinear, meaning they abide by exponential rather than ad­ditive relationships.

Dynamic systems give analysts plenty of problems                                                                    




The analysts know the flaws in the computer models. These inevitably arise because—as a consequence of chaos theory—even the most trivial bug in the model can have potentially profound effects. The unique resource that these analysts were contributing was their eye­sight. It is a valuable tool for analysts in any discipline—a visual inspection of a graphic showing the interaction between two variables is often a quicker and more reliable way to detect outliers in your data than a statistical test. It's also one of those areas where computers lag well behind the human brain  Humans by contrast, out of pure evolutionary necessity, have very powerful visual cortexes. They rapidly parse through any distortions in the data in order to identify abstract qualities like pattern and organization—qualities that happen to be very important in different types of systems.




The best analysts, need to think visually and abstractly while at the same time being able to sort through the abundance of information the computer provides them with. Moreover, they must understand the dy­namic and nonlinear nature of the system they are trying to study. It is not an easy task, requiring vigorous use of both the left and right brain.




Economists can talk themselves into believing that other types of variables—anything that has any semblance of economic meaning—are critical "leading indicators" foretelling a recession or recovery months in advance. One forecasting firm brags about how it looks at four hundred such variables,  far more than the two or three dozen major ones that Hatzius says contain most of the economic sub­stance.* Other analysts have touted the predictive power of such relatively obscure indicators as the ratio of bookings-to-billings at semiconductor compa­nies. With so many variables to pick from, you're sure to find some­thing that fits the noise in the past data well.





It's much harder to find something that identifies the signal



"Figuring out what's truly causal and what's correlation is very difficult to do."

Most of you will have heard the maxim "correlation does not imply causa­tion." Just because two variables have a statistical relationship with each other does not mean that one is responsible for the other. For instance, ice cream sales and forest fires are correlated because both occur more often in the sum­mer heat. But there is no causation; you don't light a patch of the Montana brush on fire when you buy a pint of Haagen-Dazs.

If this concept is easily expressed, however, it can be hard to apply in prac­tice, particularly when it comes to understanding the causal relationships in business.





So analysrs should  not just look for patterns. Finding patterns is easy in any kind of data-rich environment; that's what mediocre gamblers do. The key is in determining whether the patterns represent noise or signal.

But although there isn't any one particular key , there is a particular type of thought process that helps govern  decisions. It is called Bayesian reasoning.







Bayes's much more famous work, `An Essay toward Solving a Problem in the Doctrine of Chances," concerned how we formulate probabilistic beliefs about the world when we encounter new data.


The argument made by Bayes  is not that the world is intrinsically probabilistic or uncertain. Bayes was a believer in divine perfection; he was also an advocate of Isaac Newton's work, which had seemed to suggest that nature follows regular and predictable laws.




When  there is  an exponential increase in the number of hypotheses to investigate and  if you want to test for relationships between all com­binations of two pairs of these statistics—is there a causal relationship between the bank prime loan rate and the unemployment rate in Alabama?—that gives you literally one billion hypotheses to test.*

But the number of meaningful relationships in the data—those that speak to causality rather than correlation and testify to how the world really works—is orders of magnitude smaller. Nor is it likely to be increasing at nearly so fast a rate as the information itself; there isn't any more truth in the world than there was before the Internet or the printing press. Most of the data is just noise, as most of the universe is filled with empty space.


Meanwhile, as we know from Bayes's theorem, when the underlying inci­dence of something in a population is low (breast cancer in young women; truth in the sea of data), false positives can dominate the results if we are not careful. 80 percent of true scientific hypotheses are correctly deemed to be true, and about 90 percent of false hypotheses are correctly rejected. And yet, because true findings are so rare, about two-thirds of the findings deemed to be true are actually false!


Unfortunately, the state of published research and analysis in most fields that conduct statistical testing has a high error rate so high. There are many reasons for it—some having to do with our psychological biases, some having to do with com­mon methodological errors, and some having to do with misaligned incentives. Close to the root of the problem, however, is a flawed type of statistical thinking that these researchers are applying.

' The number of possible combinations is calculated as 45,000 times 44,999 divided by two, which is 1,012,477,500.

One difference is that the negative findings are probably kept in a file drawer rather than being published (about 90 percent of the papers published in academic journals today document positive findings rather than negative ones). However, that does not mask the problem of false positives in the findings that do make it to






Computers are very, very fast at making calculations. Moreover, they can be counted on to calculate faithfully—without getting tired or emotional or chang­ing their mode of analysis in midstream.

But this does not mean that computers produce perfect forecasts, or even necessarily good ones. The acronym GIGO ("garbage in, garbage out") sums up this problem. If you give a computer bad data, or devise a foolish set of in­structions for it to analyze, it won't spin straw into gold. Meanwhile, computers are not very good at tasks that require creativity and imagination, like devising strategies or developing theories about the way the world works.

Computers are most useful to analysts, therefore, in fields like weather forecasting and chess where the system abides by relatively simple and well-understood laws, but where the equations that govern the system must be solved many times over in order to produce a good analysis. They seem to have helped very little in fields like economic or business forecasting where our under­standing of root causes is blurrier and the data is noisier. In each of those fields, there were high hopes for computer-driven forecasting in the 1970s and 1980s when computers became more accessible to everyday academics and scientists, but little progress has been made since then.

Many fields lie somewhere in between these two poles. The data is often good but not great, and we have some understanding of the systems and pro­cesses that generate the numbers, but not a perfect one. In cases like these, it may be possible to improve predictions through the process that Decision Imaging allows. This is at the core of business strategy for the company we most commonly associate with Big Data today.





 If you search for a term like best new mexican restaurant, does that mean you are planning a trip to Albuquerque? That you are looking for a Mexican restaurant that opened recently? That you want a Mexican restaurant that serves Nuevo Latino cuisine? You probably should have formed a better search query, but since you didn't, Google can convene a panel of 1,000 people who made the same request, show them a wide variety of Web pages, and have them rate the utility of each one on a scale of 0 to 10. Then Google would display the pages to you in order of the highest to lowest average rating.

Google cannot do this for every search request, of course—not when they receive hundreds of millions of search requests per day. But, they do use human evaluators on a series of representative search queries. Then they see which statistical measurements are best correlated with these human judgments about relevance and usefulness. Google's best-known statistical measurement of a Web site is PageRank,45 a score based on how many other Web pages link to the one you might be seeking out. But PageRank is just one of two hundred signals that Google uses to approximate the human evalua­tors' judgment.

Of course, this is not such an easy task—two hundred signals applied to an almost infinite array of potential search queries. This is why Google places so much emphasis on experimentation and testing. The product you know as Google search, as good as it is, will very probably be a little bit different




What makes the company successful is the way it combines this rigorous commitment to testing with its freewheeling creative culture. Google's people are given every inducement to do what people do much better than computers: come up with ideas, a lot of ideas. Google then harnesses its immense data to put these ideas to the test. The majority of them are discarded very quickly, but the best ones survive.





Predictions  that work in the real world rather than in the comfort of a statistical model—is probably the best way to accelerate the learning process.

Overcoming Our Technological Blind Spot

In many ways, we are our greatest technological constraint. The slow and steady march of human evolution has fallen out of step with technological progress evolution occurs on millennial time scales, whereas processing power doubles roughly every other year.




. Nowadays, in a fast-paced world awash in numbers and statistics, those same tendencies can get us into trouble: when presented with a series of random numbers, we see patterns where there aren't any.



With all the informa­tion in the world today, it's certainly helpful to have machines that can make calculations much faster than we can.

But if you get the sense that the  analyst means this more literally—that he thinks of the computer as a sentient being, or the model as having a mind of its own—it may be a sign that there isn't much thinking going on at all. What­ever biases and blind spots the analyst  has are sure to be replicated in his computer program.

We have to view technology as what it always has been—a tool for the bet­terment of the human condition.


Excerpts from Nate Silver – The Signal and the Noise.

Privacy Feedback Contact Webmail