As a kid, I read Asimov’s Foundation series in which Hari Seldon develops a mathematical description of society called psychohistory. The science in the books is completely fictional, but it always sat at the back of my mind. What if there was a kernel of truth in the fiction? What if people could be predictable?Psychohistory has two main axioms (taken from the Wikipedia entry):
- that the population whose behaviour was modeled should be sufficiently large
- that the population should remain in ignorance of the results of the application of psychohistorical analyses
The first axiom has an analogy in statistical physics: the number of particles should be sufficiently large. A single atom doesn’t really have a temperature because temperature is a measure of how quickly disorder is increasing in a system. A single atom can’t increase its disorder, but it can have an energy. It just happens that the rate of entropy increase is proportional to the average energy of a group of particles, so we equate temperature with energy and assume that a single atom can have a temperature. The entropy-based definition of temperature is more general than the energy-based definition: it allows negative temperatures.The second axiom is similar to what you might expect for a psychology experiment: knowledge of the experiment by the participants can affect the outcome. For example, using purchasing data instead of asking someone outright if they are pregnant because sometimes the contextually acceptable answer will trump the truth.The important thing is that people are predictable in aggregate. This is what allows a political poll to predict an election outcome without having to ask everyone who will be voting, though polls aren’t perfectly predictable in part because someone will be more likely to tell a pollster what they think is socially acceptable, which might not show how they vote when they think no one is watching, thus reinforcing the need for the second axiom.ModelsLet’s jump to a different point and see how we can work our way back to the above axioms: models.Models are constructs designed to mimic the thing being modeled. Typical models are simpler because they leave out nuances that complicate the model without adding any predictive power or because they leave out aspects of the original system that aren’t interesting for the question being asked. For example, if we want to model a dripping faucet, we may include the mass of the drop, the flow of water into the drop, and the drop velocity, but not the temperature of the water if we are more interested in how the drip rate changes with water flow.Models are useful because they allow us to make predictions and, if those predictions turn out accurate, to gain some insight into how the system we’re modeling might work. This is the foundation of how experimental and theoretical physics work together: theorists create models that must match with the results from experimentation. Theorists might come up with beautiful models, but if their predictions don’t match with experiment, or they can’t be differentiated from each other through some well-designed experiment, then they don’t become theories. They remain curios.The dripping faucet is a classic example of what modeling can do. We can build out a set of relationships between the various components: for the mass of the water droplet, for the droplet’s position, and for the droplet’s acceleration. This assumes a constant influx of water ().
There are more constraints for when a droplet separates from the faucet, but the above equation captures the main spring-like motion involved.
How do we test this model? We need to figure out what we can measure in a physical system that we can also calculate from the model. We can’t easily test the model if we can only calculate the mass of the drops and can only measure when a drop drips.
Fortunately, we can record when the model predicts a drop will happen because one of the constraints is that when the forming drop gets far enough away from the faucet, it separates and becomes a drop falling through the air. We can also record when drops form in a real faucet. Comparing the two tells us if the model is sufficiently accurate.
It turns out that the model is surprisingly useful for understanding the qualitative aspects of dripping fluids. When the flow is small, drops form regularly like clock work. As the influx of water increases, the drops form more rapidly until they start forming beats with different repeating timing patterns between drops. At high enough flows, the drops fall without any clear regularity, forming a chaotic stream before the incoming flow overwhelms the water’s ability to form discrete drops.
Now we’re in a place to tackle one remaining question before we circle back around to predictable mobs: how do we model a system when we can’t look at how the system works? With the dripping faucet, we could see the water forming, the water coming in, and the droplet pulling down. What if all we could see was the drop appearing and disappearing? More importantly for mobs, what if all we can see is some measure of group social behavior?
Fortunately, we can appeal to information theory to help us understand our options. The idea is that even if we can’t know everything with certainty, we can know something, and we can know the limits of our certainty.
Information theory helps us understand why compressing a compressed file doesn’t gain us much extra space. If we have ten megabytes of information spread across a gigabyte, we can expect about ten megabytes for the compressed version, but we can’t expect a smaller file because we’d be losing information. The amount of “real” information in a file gives us a lower bound on the size of the compressed version of the file since the compressed version must hold all the information in the uncompressed file. This is the same reason that outside of movies we can enhance low resolution raster images to show license plate numbers: you can’t create information that isn’t already there.
This means that we need to figure out what information we can gather, and the limits on the interpretation of that information. This gets us into statistics and probabilities. The tools that are available require large numbers for their interpretations to make sense. This is why we need a mob before we can start making predictions.
I’ll run through a few tools that I’ve used in the past. These all assume that the thing we’re studying is a black box: that we have limited ability to measure anything about the system or see into its inner workings. These also assume that we have a lot of observations to work with and that we are comfortable with probability based forecasts. These tools also have been around for a few decades now, coming from physics and similar fields around the latter half of the twentieth century.
Autocorrelation measures how well a given data series correlates with itself at different times. Typically, we use this to find a lag such that the data set is as dissimilar as possible when compared to the lagged data set.
Lyapunov exponents measure how error grows or shrinks through time. Typical physical systems that have some damping in them, such as springs whose vibrations fade through friction, will have negative Lyapunov exponents. A positive exponent tends to imply that error grows in time, so a small error in measuring a condition will lead to a larger error as the model progresses. This has implications for the predictive ability of a model.
Phase space gives a graphical depiction of the different states that a system can have, and how those states change over time. Each independent variable in a system gets its own axis, so the dripping faucet model above would have a three-dimensional phase space.
The attractor is the shape in phase space that shows the different states of a system. If the system starts with conditions that are not on the attractor, then the system will evolve to where its state is on the attractor. The attractor “attracts” the system’s state.
The dimensionality of an object tells us how the object scales as it grows or shrinks. A one-dimensional object increases its enclosed size by a factor of two when its length measure doubles. A two-dimensional object increases its enclosed size by a factor of four when its length measure doubles (a circle of radius two encloses four times as much area as a circle of radius one). Likewise, a three-dimensional object increases its enclosed size by a factor of eight when its length measure doubles (a sphere of radius two encloses eight times as much volume as a sphere of radius one). The pattern is with being the dimensionality.
Finally, we can get back to predictable mobs.
Assume that the thing that we can measure isn’t time between drips or how many cars pass through an intersection. What if it’s the stock price of a particular company over a twenty-six year period? That’s what I did in a conference paper over a decade ago. In the paper, I didn’t try to figure out how the stock worked. Instead, I focused on what I could figure out given what I had available: a single data set analogous to the times between drips from a faucet.
I ran a similar analysis of the S&P 500 recently and found qualitatively similar results, though I need to do some more work before I can be sure that the underlying model is reasonable.
Keep in mind that the S&P 500 is not a single stock, but a total value representing five hundred different stocks, all bought and sold by the millions each day by nominally independent investors.
It appears that the attractor is three- or four-dimensional. The attractor shows that the S&P is not all over the map, but has particular patterns and trends. It tends to stay in a region for a while before transitioning to a different region, where it seems to stick around for a while again.
Using something like a stock price that has high trade volume seems to satisfy the first axiom of psychohistory. How about the second axiom, that those governed by it must not know about it? Consider what would happen if stock prices were predictable: we would trade based on the prediction. This change in behavior might represent a fundamental change in the system, which means that it’s no longer the same system that generated the data being used to make the predictions.
Over the next few weeks, I’ll put together a series of blog posts covering each of the tools that can be used in this kind of situation showing how the tool works, what it shows about the above data, and how it might have applications in other areas, such as textual analysis.