Distributed computing | James Gottlieb

In the last post (back in 2012—this post has been in draft for some time), we talked a bit about streams. Eventually, I want to see how we might use streams to write a parser that can parse the language we're developing to talk about streams, but let's start with something a little simpler. The textual digital humanities revolve around text, and the really fun stuff involves textual manipulation if not parsing.

Today, let's build a set of tools that will help us create a concordance of a text. We'll have to make a lot of assumptions so that we can see the core pieces, so keep in mind that any real implementation will probably have different details.

We'll assume for now that we have a stream of characters representing the text. We haven't discussed where we get data or where we store it yet. That's for another time. For now, we're focused on what we do with the data between getting and storing it. If we can wrap our minds around what to do with the data, then we can plug-in any data retrieval or storage we want onto our processing later.

Continue Reading Streams, Part II

English: Stream (Photo credit: Wikipedia)

While I am trying to round out the content management aspects of OokOok this year, I'm starting to think ahead to next year's work on databases and processing. Part of the goal is to offer a platform that lets you take advantage of parallel processing without requiring that you be aware that you're doing so. Of course, any such platform will be less powerful than hand-coding parallel code in C or using your favorite Hadoop library. Less powerful is better than not available. I want OokOok to make available capabilities that would otherwise be hidden away.

Map/reduce seem like the simplest way to think about parallel processing. We have two kinds of operations: those that look at one item at a time (mappings), or those that have to see everything before they can finish their calculation (reductions). Reductions can get by seeing one item at a time if they can keep notes on a scratch pad. We could put operations then into two slightly different camps: those that need a scratch pad (reductions) and those that don't (mappings).

Continue Reading CAR, CDR, CONS, Oh My!

James Gottlieb

Tag Archives: Distributed computing

Streams, Part II

CAR, CDR, CONS, Oh My!

Seeing what happens when you collide the humanities with the digital