Seeing what happens when you collide the humanities with the digital
James is a software developer and self-published author. He received his B.S. in Math and Physics and his M.A. in English from Texas A&M University. After spending almost two decades in academia, he now works in the Washington, DC, start up world.
Last week, I wrote about how mobs might be predictable. One of the first tools that I mentioned was autocorrelation. This is a basic tool that we will use with the others in the list, so it's important to understand exactly what it does. That's what I want to explore this week.
Let's go back to high school geometry. We can define several properties and operations in terms of the angles and sides of the parallelogram to the right, though we'll need to dive into the cartesian coordinate system a bit to see how to move on to the next step towards the autocorrelation.
We want to look at what it means to do mathematical operations on these line segments. We know that we can add numbers together to get new numbers, but what does it mean to add line segments? If we take the segment from D to E, and add the segment from E to B, it's obvious that we end up with the segment from D to B. But what's not as obvious is that if we take D to E and add from E to C, we end up with D to C.
As a kid, I read Asimov'sFoundation series in which Hari Seldon develops a mathematical description of society called psychohistory. The science in the books is completely fictional, but it always sat at the back of my mind. What if there was a kernel of truth in the fiction? What if people could be predictable?
Psychohistory has two main axioms (taken from the Wikipedia entry):
that the population whose behaviour was modeled should be sufficiently large
that the population should remain in ignorance of the results of the application of psychohistorical analyses
The first axiom has an analogy in statistical physics: the number of particles should be sufficiently large. A single atom doesn't really have a temperature because temperature is a measure of how quickly disorder is increasing in a system. A single atom can't increase its disorder, but it can have an energy. It just happens that the rate of entropy increase is proportional to the average energy of a group of particles, so we equate temperature with energy and assume that a single atom can have a temperature. The entropy-based definition of temperature is more general than the energy-based definition: it allows negative temperatures.
The second axiom is similar to what you might expect for a psychology experiment: knowledge of the experiment by the participants can affect the outcome. For example, using purchasing data instead of asking someone outright if they are pregnant because sometimes the contextually acceptable answer will trump the truth.
The important thing is that people are predictable in aggregate. This is what allows a political poll to predict an election outcome without having to ask everyone who will be voting, though polls aren't perfectly predictable in part because someone will be more likely to tell a pollster what they think is socially acceptable, which might not show how they vote when they think no one is watching, thus reinforcing the need for the second axiom.
I've had the e-book edition of my novel, Of Fish and Swimming Swords, available for Kindle and Smashwords for two years. Now, I have a print edition.
You can order a print copy from CreateSpace. Use the discount code XMXXKGKU to get 25% off.
The print cover is different from the digital, but I still tried to put together a cover that was somewhat connected to the novel. The digital cover reflects the role of fours and a virtual world tree. In the case of the print edition, the artifacts resemble meshing gears, cycles enmeshed with cycles, and discarded materials half buried in the sand, similar to the layers of conspiracy in the story feeding off of each other and only half emerging from the text.
The next step is to match up the print and digital editions on Amazon so that you can get a copy of the digital edition when you buy a copy of the print through Amazon's Kindle Matchbook program.
I'm slow writing novels. I've drafted the first half (70,000 words) of a new one with the working title Silent Rain (you can see how slow I've been if you've noticed the yellow progress bar in the sidebar that hasn't moved in almost a year). Now I'm going back and editing it down to refresh my memory of the story in preparation for starting a push through the second half in November for NaNoWriMo. I don't expect to have the editing finished over the next month and a half, but I do plan on releasing the first half as a standalone work in early spring while I wrap up the second half.
Meanwhile, I thought I'd share the beginning of the novel with you so you can see where it's going, or at least how it starts. This is after a first edit to get rid of much of the slow sections and tighten the dialogue. Other rounds will deal with other aspects of the text.
I stood in line for a few hours the day the iPhone came out in 2007. I had been using various cell phones before then, but the iPhone was revolutionary. I didn't have to wade through sales pitches and confusing marketing to figure out which features I needed to pay for. Everything was included for a single price, and the price only depended on how many minutes I needed each month.
Cell phone companies have recovered some ground. Monthly fees depend on how many minutes AND how much data you want, as well as whether or not you want to tether a laptop or other device to the phone (that was always off the table with the first iPhone). If you want to upgrade more often than every two years, that's another new monthly fee. Not quite as bad as before the iPhone, but getting more complicated so you don't realize just how much you're paying for spotty service. Until we have real competition in the cell market, this will be our future.
Today, let's build a set of tools that will help us create a concordance of a text. We'll have to make a lot of assumptions so that we can see the core pieces, so keep in mind that any real implementation will probably have different details.
We'll assume for now that we have a stream of characters representing the text. We haven't discussed where we get data or where we store it yet. That's for another time. For now, we're focused on what we do with the data between getting and storing it. If we can wrap our minds around what to do with the data, then we can plug-in any data retrieval or storage we want onto our processing later.
While I am trying to round out the content management aspects of OokOok this year, I'm starting to think ahead to next year's work on databases and processing. Part of the goal is to offer a platform that lets you take advantage of parallel processing without requiring that you be aware that you're doing so. Of course, any such platform will be less powerful than hand-coding parallel code in C or using your favorite Hadoop library. Less powerful is better than not available. I want OokOok to make available capabilities that would otherwise be hidden away.
Map/reduce seem like the simplest way to think about parallel processing. We have two kinds of operations: those that look at one item at a time (mappings), or those that have to see everything before they can finish their calculation (reductions). Reductions can get by seeing one item at a time if they can keep notes on a scratch pad. We could put operations then into two slightly different camps: those that need a scratch pad (reductions) and those that don't (mappings).
I'm making rapid progress in getting OokOok to a stable programming scheme. I haven't made a lot of changes in its capabilities, though I did add the ability to archive themes and projects as Bagit files yesterday, I've been working on making the important stuff declarative. By hiding all the details behind a veneer of relationships, I can fiddle with how I manage those relationships without having to touch every relationship every time I make a change in the underlying schemas (and schemes).
For those used to an older style of Perl programming, this might come as a surprise. For those who have dealt with things like MooseX::Declare and CatalystX::Declare, you'll be shaking your head at my foolhardiness in jumping into making an OokOok:Declare that hides the details of how to construct certain types of classes.
Behind the scenes, OokOok consists of controllers, models, views, REST collections/resources, SQL result objects, a template engine, and tag libraries for the templates. Almost two hundred classes in all.
If I built all of these the usual Perl way, there'd be a lot of boilerplate code around. By moving to a declarative approach, I can isolate all the boilerplate in a few core meta-classes. When the boilerplate has to change, I only have to touch one place. Everything else comes along for the ride.
For the rest of this post, I want to walk through how I use some of these declarative constructions. I won't get into the scary details of how to make declarative constructions in Perl (at least, not in this post).
OokOok is coming along nicely. It's been a couple of months since the last update, so I'll outline a bit of what I've done since the last post. I'm nowhere near being able to throw up a demonstration server for anyone to play with, but I'm getting closer. With a little more testing, a reasonably decent administrative interface, some simple themes, and full authorization management, we'll be good to go on a first demo. I'm aiming for the end of the year. I'm trying to think about what a good, simple demonstration project might be that is just text on-line. Perhaps a curated collection of creative-commons licensed works on a subject?
OokOok isn't meant to do everything for everyone. I'm designing it with opinions. I think they are well researched and thought out opinions, but they are opinions. I hope the pros can outweigh the cons, but that's something you'll need to decide when considering which platform to use for your project.
I'm designing the system to enable citation, reproduction, sustainability, and description. You should be able to point someone at exactly the version of the page that you saw (citation), be able to see the same content each time you view that version of the page (reproduction), see that content "forever" (sustainability), and leverage computation through description (composing the rules) instead of prescription (composing the ways). I've based all the opinionated choices in the system on trying to meet the needs of those four "axioms."
Today, I want to explore what it means for something to be citable.
I come from the sciences, where citation is a shorthand for bringing in a body of work that you don't want to reproduce in your text. It's like linking in a library in a program. You're asserting that something is important to your argument and anyone can find out why they should believe it by following the citation. You don't have to explain the reasoning behind what you're referencing.
If you use citations to give shout outs to people in your field, then you don't need what I'm thinking about. Readers understand that these citations are to remind them about the other people and their body of work, not the particular passage pointed to in the citation. The details aren't important enough to look up.
I'm interested in the citations that people need to follow.