If we took stock of everything that we know and compared it to what we don't know, we'd find that we know a lot about almost nothing.1 As we explore new things, we need tools which give us an idea of what we're working with even when we don't know what it is. In textual scholarship, we like to do close readings: understanding all the nuances of a text word by word so that we can tease out almost hidden meanings that rely on us understanding the text as well as its context.2 Sometimes, we don't have a text or a context, but the effect of the text upon an audience. Or, to put it in more practical terms, we can't tell what goes on inside an author's mind, but we do have the resulting text. What can we learn about that mind from the text it produces?
In statistics, saying that something is "almost never" and "has zero probability" are pretty much the same. If we counted all the things that we know and divided it by the number of things that we don't know, the result would be almost zero. It is ironic that the more we study, the closer the ratio gets to zero. ↩
Last week, I wrote about how mobs might be predictable. One of the first tools that I mentioned was autocorrelation. This is a basic tool that we will use with the others in the list, so it's important to understand exactly what it does. That's what I want to explore this week.
Let's go back to high school geometry. We can define several properties and operations in terms of the angles and sides of the parallelogram to the right, though we'll need to dive into the cartesian coordinate system a bit to see how to move on to the next step towards the autocorrelation.
We want to look at what it means to do mathematical operations on these line segments. We know that we can add numbers together to get new numbers, but what does it mean to add line segments? If we take the segment from D to E, and add the segment from E to B, it's obvious that we end up with the segment from D to B. But what's not as obvious is that if we take D to E and add from E to C, we end up with D to C.
As a kid, I read Asimov'sFoundation series in which Hari Seldon develops a mathematical description of society called psychohistory. The science in the books is completely fictional, but it always sat at the back of my mind. What if there was a kernel of truth in the fiction? What if people could be predictable?
Psychohistory has two main axioms (taken from the Wikipedia entry):
that the population whose behaviour was modeled should be sufficiently large
that the population should remain in ignorance of the results of the application of psychohistorical analyses
The first axiom has an analogy in statistical physics: the number of particles should be sufficiently large. A single atom doesn't really have a temperature because temperature is a measure of how quickly disorder is increasing in a system. A single atom can't increase its disorder, but it can have an energy. It just happens that the rate of entropy increase is proportional to the average energy of a group of particles, so we equate temperature with energy and assume that a single atom can have a temperature. The entropy-based definition of temperature is more general than the energy-based definition: it allows negative temperatures.
The second axiom is similar to what you might expect for a psychology experiment: knowledge of the experiment by the participants can affect the outcome. For example, using purchasing data instead of asking someone outright if they are pregnant because sometimes the contextually acceptable answer will trump the truth.
The important thing is that people are predictable in aggregate. This is what allows a political poll to predict an election outcome without having to ask everyone who will be voting, though polls aren't perfectly predictable in part because someone will be more likely to tell a pollster what they think is socially acceptable, which might not show how they vote when they think no one is watching, thus reinforcing the need for the second axiom.