Last week, I wrote about how mobs might be predictable. One of the first tools that I mentioned was autocorrelation. This is a basic tool that we will use with the others in the list, so it’s important to understand exactly what it does. That’s what I want to explore this week.



Parallelogram (Photo credit: Wikipedia)

Let’s go back to high school geometry. We can define several properties and operations in terms of the angles and sides of the parallelogram to the right, though we’ll need to dive into the cartesian coordinate system a bit to see how to move on to the next step towards the autocorrelation.

We want to look at what it means to do mathematical operations on these line segments. We know that we can add numbers together to get new numbers, but what does it mean to add line segments? If we take the segment from D to E, and add the segment from E to B, it’s obvious that we end up with the segment from D to B. But what’s not as obvious is that if we take D to E and add from E to C, we end up with D to C.

The reason that addition works like this is because we aren’t really adding the line segments as they are drawn. Instead, we’re treating them like pointers: if you start at D and go to E, and then go to C, you end up at the same place you would be if you had started at D and gone to C. Another word for this pointer is “vector.”

Just as we can add vectors, and we can subtract vectors: if we ended up at C from D and one part of our journey was from E to C, what was the other part of our walk?

Geometric interpretation of cross-product of t...

Geometric interpretation of cross-product of two vectors (Photo credit: Wikipedia)

What about multiplication? This is where it gets tricky because there are two different ways to look at the problem. One is to ask how much area is swept out by one vector as it travels parallel to itself along another vector. We call this the cross product. It has the interesting but not surprising property that when two vectors are parallel with each other, the cross product is zero. The cross product reaches a maximum when the two vectors are perpendicular to each other. We won’t work with the cross product too much today.

Instead, we want the other way of multiplying two vectors by asking the question: how long does the  vector look to be “in profile” when we look at it from another vector? That is, if we have a line perpendicular to the line DC and moved it from D towards C until it hit A, how far from D would we be? This is called the dot product. It has the very important property that when two vectors are perpendicular to each other, the dot product is zero. The dot product reaches its maximum when the two vectors are parallel to each other.

The magnitude of the two products follows a pattern: the product of the lengths of the two vectors multiplied by a factor dependent on the angle between the two vectors. This factor is either the sine of the angle for the cross product, giving us the area of the parallelogram, or the cosine of the angle for the dot product, giving us the projection of one vector onto the other.

 \|\vec a \times \vec b\| = \|\vec a\| \|\vec b\| \sin \theta\\\|\vec a \cdot \vec b\| = \|\vec a\| \|\vec b\| \cos \theta

If you remember the distance formula, you’ll notice that |\vec a \cdot \vec a| = |\vec a|^2, which means that the dot product of a vector with itself is just the square of the length of the vector. Likewise, we can find the angle between the two vectors with a little rearrangement:

\cos \theta = {{\|\vec a \cdot \vec b\|}\over{\|\vec a\| \, \|\vec b\|}}

When this measure is equal to one, the two vectors are parallel. When it is equal to zero, they are perpendicular. When two vectors are perpendicular to each other (e.g., the x- and y- axes in Cartesian coordinates), then you can move along one without affecting where you are along the other. The two vectors are orthogonal.1 We can also say that the vectors represent independent quantities since changing where one is along one vector doesn’t need a change in the place on the other.

Cartesian Space

The next step to the autocorrelation is to transform our geometric definitions into ones using Cartesian coordinates. It’s easy to do this if we think of the x-axis as being built from a single vector of length one added to itself many times: eight of these vectors to get to a position of eight on the x-axis. Likewise with the y-axis. So any vector we want to talk about is created by adding together a number of y-direction unit vectors and another number of x-direction unit vectors. These numbers are the Cartesian coordinates of the vector we’re interested in.

In the end, we can represent our vectors \vec a and \vec b with coordinates  (a_x,a_y) and  (b_x,b_y). If we use \vec i to represent the x-axis vector and  \vec j to represent the y-axis vector,2 then we have \vec a = a_x\vec i + a_y\vec j and  \vec b = b_x\vec i + b_y\vec j. Because the x- and y- axis vectors are perpendicular to each other, their dot product is zero:  \vec i \cdot \vec j = 0. Because each of \vec i and \vec j are of length one, their dot product with themselves is one. This simplifies the dot product of the two vectors considerably.

\| \vec a \cdot \vec b\| = \|a_x b_x + a_y b_y\|

If we add a third basis vector (e.g., a z-axis), we just tack on more terms for that vector (e.g.,  +a_z b_z). The same for a fourth, fifth, etc. There’s no limit to how many terms we can tack on in this way if they are all orthogonal to each other.

If we have a large number of dimensions, we can represent the dot product with the summation:

 \| \vec a \cdot \vec b\| = \left| \sum a_i b_i \right|

Here, we let i be the full set of coordinates that make up the two vectors.

Infinite-Dimensional Spaces!

There’s one last thing we need to work through before we tackle autocorrelation: what happens when we have an infinite number of dimensions? We’ve defined a dot product for any number of dimensions, but only when we have a definite, bounded number of them. After all, we can’t go on writing down new terms forever; we’d never finish and do the calculation.

Fortunately, there’s a way around this using some ideas from calculus. Take a regular polygon as an example. If we keep adding sides, say going from a square to a hexagon to an octagon, etc., we keep having smaller edge lengths and wider angles. If we keep adding sides, we’ll eventually get a shape that resembles a circle, even if it is never a circle: there will always be a straight edge in a regular polygon regardless of how many sides it has. However, in the limit, the regular polygon with unbounded number of edges takes on the characteristics of a circle and we can say that, in the limit, the regular polygon is a circle.

This is how we handle this infinite number of dimensions: we look at the characteristics of a very large number of dimensions, see how those characteristics converge on particular behaviors, and then accept those behaviors and characteristics as true for the high-dimensionality case in the limit of an unbounded number of dimensions.

So what does such a beast look like? Simple: an object that takes an index of the dimension and returns the length along that dimension. In other words, an ordinary function that takes a number and returns a number.

As with all things when moving from finite algebraic expressions to calculus, we change things a bit. Assuming that we have two infinite-dimensional vectors  f(x) and  g(x) (with x representing our dimension index), then we can define the dot product as follows:

\| f \cdot g \| = \left| \int f(x)g(x)\,dx \right|

This is for the full range of dimensions, so whatever the proper range of values for  x might be.

The fun starts when we substitute different functions for  f and  g. For example, if we substitute \sin and \cos and consider  x to range from  0 to  2\pi, then we see that the result is zero: the two functions \sin and \cos are orthogonal to each other. If we substitute either of the two functions for both f and g, we see that the result is not zero: it’s \pi. In fact, for any non-zero integer  n, the dot product of \sin n \theta with itself is equal to \pi.

Now you know enough to understand what’s happening in a Fourier Transform, where a time-based signal is transformed into a frequency-based representation (e.g., a recording of a musical chord turned into a function showing which notes are being played).


With the last bit, we’re ready to tackle autocorrelation. The basic idea is to let the function or data represent an infinite-dimensional vector (or a close approximation of it if we have a discrete and limited data set). We want to find another function that is orthogonal to our first function, but with the need that the second function be exactly the same as our first except for a phase shift:  g(x) = f(x + \Delta). What is the  \Delta that minimizes the size of the dot product of  f and  g?

 \| f \cdot f_\Delta\| = \left| \int f(x) f(x+\Delta)\,dx \right|\\\|f \cdot f_\Delta\| = \left| \sum f_x f_{x+\Delta} \right|

We use whichever formula fits our needs: the integral for continuous functions and the summation for discrete, finite data. We’ll typically use the summation since most projects using autocorrelation in the way we plan to do are working with limited data sets.

In practice, the easiest way to find \Delta is to calculate the summation for all possible values of \Delta and then taking the smallest \Delta value that makes the autocorrelation hit the x-axis (or cross the x-axis if you remove the absolute value from the formulas).3

Interpreting and Using Autocorrelation

What does the autocorrelation mean, and, more importantly, what does the Delta tell us?

Remember that \Delta is the value that takes f(x) and provides a new function f(x+\Delta) that has a similar relationship with f(x) as \sin has with \cos: they are orthogonal. This means that f(x) and f(x+\Delta) can serve a similar role as \sin and \cos play in the Fourier Transform.

Another way to look at it is that the value of f(x) is most correlated with itself (f(x)), and least correlated with f(x+\Delta). We will rely on this when we look at phase space and building an attractor to show us the qualitative aspects of the system’s behavior that generated our data.

Autocorrelation Algorithm

Finally, we can look at how we might write the code to calculate the autocorrelation of a dataset. The following is a Scala version that shows the basic process going on.4

We can test this with the sine:

When we run this, we get 128 as our output indicating that the \sin function first becomes uncorrelated with itself around a \Delta of 128, or \pi\over 2, which just happens to be the phase difference between the sine and cosine.

The following show the original sine function and the resulting autocorrelation integral approximated by the summation in the code (courtesy of WolframAlpha).

Sine wave

Autocorrelation integral of the sine wave


Autocorrelation in the Humanities

There isn’t much use for the autocorrelation in the digital humanities. This is mainly because we use it to find reasonable parameters for other tools that might be useful in the humanities. We’ll see more about these in future posts.

That said, there might be some uses for autocorrelation if the information being studied can be represented as a data sequence. For example, the melody of a song might make some sense as a sequence of numbers that are correlated close in time, but not necessarily over long periods of time in a piece of music. The autocorrelation would show us how much lag we would need to introduce into the music not to have an obvious correlation between the notes. Of course, this ignores correlations between sections that arise from harmonies, such as we might see in a round where the second and third voices repeat the melody with a lag that results in harmonization.

Published by


James is a software developer and self-published author. He received his B.S. in Math and Physics and his M.A. in English from Texas A&M University. After spending almost two decades in academia, he now works in the Washington, DC, start up world.

One thought on “Autocorrelation”

Comments are closed.