Last week, I wrote about how mobs might be predictable. One of the first tools that I mentioned was autocorrelation. This is a basic tool that we will use with the others in the list, so it's important to understand exactly what it does. That's what I want to explore this week.
Let's go back to high school geometry. We can define several properties and operations in terms of the angles and sides of the parallelogram to the right, though we'll need to dive into the cartesian coordinate system a bit to see how to move on to the next step towards the autocorrelation.
We want to look at what it means to do mathematical operations on these line segments. We know that we can add numbers together to get new numbers, but what does it mean to add line segments? If we take the segment from D to E, and add the segment from E to B, it's obvious that we end up with the segment from D to B. But what's not as obvious is that if we take D to E and add from E to C, we end up with D to C.
As a kid, I read Asimov'sFoundation series in which Hari Seldon develops a mathematical description of society called psychohistory. The science in the books is completely fictional, but it always sat at the back of my mind. What if there was a kernel of truth in the fiction? What if people could be predictable?
Psychohistory has two main axioms (taken from the Wikipedia entry):
that the population whose behaviour was modeled should be sufficiently large
that the population should remain in ignorance of the results of the application of psychohistorical analyses
The first axiom has an analogy in statistical physics: the number of particles should be sufficiently large. A single atom doesn't really have a temperature because temperature is a measure of how quickly disorder is increasing in a system. A single atom can't increase its disorder, but it can have an energy. It just happens that the rate of entropy increase is proportional to the average energy of a group of particles, so we equate temperature with energy and assume that a single atom can have a temperature. The entropy-based definition of temperature is more general than the energy-based definition: it allows negative temperatures.
The second axiom is similar to what you might expect for a psychology experiment: knowledge of the experiment by the participants can affect the outcome. For example, using purchasing data instead of asking someone outright if they are pregnant because sometimes the contextually acceptable answer will trump the truth.
The important thing is that people are predictable in aggregate. This is what allows a political poll to predict an election outcome without having to ask everyone who will be voting, though polls aren't perfectly predictable in part because someone will be more likely to tell a pollster what they think is socially acceptable, which might not show how they vote when they think no one is watching, thus reinforcing the need for the second axiom.
In the Narrative Statistics series of posts, I'm exploring different ways to characterize fiction using statistics. I'm recovering from a flu or cold as well as a nasty cough that followed, so instead of delving into deep math, I want to review what I see as the role of statistics, at least for this series. Many people consider statistics to be magical formulae that give questionable answers. In the humanities, there seems to be a lot of mistrust for statistics because people don't understand them.
I've been in the audience when someone has presented some statistical results and someone else comments that because the outliers obviously don't agree with what they already believe to be true, the outliers must be mistakes and thus the statistical method must be suspect. They then turn around and ask what statistics can provide other than reinforcing what they already know. They first throw out any new information and then ask what new information the methods can provide. The profound lack of logic mystifies me.
Last week, we explored the Poisson distribution as a possible distribution of sentence lengths. If you look at the figure for Hunter Crackdown, the Poisson seems reasonable, but it breaks down when looking at other works. In this post, I'd like to go back and try to derive a distribution that has the same qualitative features as the distributions we saw for each of the works. Then, I want to discuss a bit what we might want to do next.
Thursdays are my research days. I have a couple things cooking away that I'm not quite ready to write about yet, but I want to take a little time today to explore something that I plan on doing a lot more once my cooking is done.
I'm interested in studying narrative as a dynamic system. That is, there are several variables at play that determine the direction of a narrative. There are plot dynamics, character dynamics, and thematics that an author plays with to construct the story. They all interact in complex ways. A particular plot might require certain type of characters. A particular character might not fit certain types of plots. Some plots and characters don't illustrate well certain themes. The author has to select the right plots, characters, and themes (and write well) for the reader to enjoy the story.