Category Archives: Digital Humanities

Linked Open Code: Libraries

A while back, I talked about how linked open code complements linked open data if we consider the web to be a computer system. This time, I explore what a linked open code (LOC) library might look like. That is, how would a collection of functions be published and used?

The emerging standard for sharing linked open data (LOD or LD) is JSON-LD mainly because it's easy to use and plays well with JSON-based REST APIs. That was by design. Consider JSON to be the modern XML, but for data rather than documents, and JSON-LD the modern RDF/XML. Everything I talk about in this post could be done with RDF/XML, but is a lot easier with JSON-LD.

A library is just a collection of functions. A class is a collection of functions operating on a common data structure (or type). We should be able to do both with LOC since LOD has the concept of types.

Continue Reading Linked Open Code: Libraries

Algorithmic Provenance of Data

It doesn't seem like it's been over four years since I joined MITH and started working with Project Bamboo. Just because I've moved on to a startup and the project's been mothballed doesn't mean we can't mine what was done.

The problems with Project Bamboo are numerious and documented in several places. One of the fundamental mistakes made early on was the waterfall approach in designing and developing an enterprise style workspace that would encompass all humanities research activities rather than produce an agile environment that leveraged existing standards and tools. Top down rather than bottom up.

However, the idea that digital humanities projects share some common issues and could take advantage of shared solutions is important. This is part of the reporting aspect of research: when we learn something new, we not only report the new knowledge, but how we got there to help someone else do similar work with different data. If we discover a way to distinguish between two authors in a text, we not only publish what we think each author wrote, but the method by which we made that determination. Someone else can apply that same method to a different text.

Continue Reading Algorithmic Provenance of Data

Linked Open Code

I've been working off and on over the last six months on a programming language that sits on top of linked open data. Think of it as linked open code.

von Neumann Architecture

Before von Neumann made his observations about code and data, computers typically had some memory dedicated to code, and other memory dedicated to data. The processing unit might have a bus for each, so code and data didn't have to compete for processor attention.

This was great if you were able to dedicate your machine to particular types of problems and knew how much data or code you would typically need.

Von Neumann questioned this assumption. Why should memory treat code and data as different things when they're all just sets of bits?

Continue Reading Linked Open Code

Fun with Functions

English: This image shows the 3-6-9 hexagram o...
English: This image shows the 3-6-9 hexagram of the circle made of the digital root of Fibonacci numbers. (Photo credit: Wikipedia)

I'm building a language designed to be natural with linked data just as programs today are natural with local memory. The result is highly functional and data-relative in nature and reminds me of how XSLT works relative to XML nodes (e.g., a current node, child nodes, ancestor nodes). I have quite a few tests passing for the parser and core engine, so now I'm branching out into libraries and seeing what I can do with all of the pieces.

Continue Reading Fun with Functions

My DH Agenda

A few months ago, I accepted a job outside the academy. This doesn't mean that I'm abandoning digital humanities. In this post, I lay out what I want to do in DH going forward. The common thread through all this is that I believe linked open data is the best way to break down the silo walls that keep digital humanities projects from sharing and build on existing data.
Continue Reading My DH Agenda

Markov Chain Text Generation

With the recent postings elsewhere about Markov Chains and text production, I figured I'd take a stab at it. I based my code on the Lovebible.pl code at the latter link. Instead of the King James Bible and Lovecraft, I joined two million words from sources found on Project Gutenberg:

  • The Mystery of Edwin Drood, by Charles Dickens
  • The Secret Agent, by Joseph Conrad
  • Superstition in All Ages (1732), by Jean Meslier
  • The Works of Edgar Allan Poe, by Edgar Allan Poe (five volumes)
  • The New Pun Book, by Thomas A. Brown and Thomas Joseph Carey
  • The Superstitions of Witchcraft, by Howard Williams
  • The House-Boat on the Styx, by John Kendrick Bangs
  • Bulfinch's Mythology: the Age of Fable, by Thomas Bulfinch
  • Dracula, by Bram Stoker
  • The World English Bible (WEB)
  • Frankenstein, by Mary Shelley
  • The Vision of Paradise, by Dante
  • The Devil's Dictionary, by Ambrose Bierce

Continue Reading Markov Chain Text Generation

Streams, Part II

In the last post (back in 2012—this post has been in draft for some time), we talked a bit about streams. Eventually, I want to see how we might use streams to write a parser that can parse the language we're developing to talk about streams, but let's start with something a little simpler. The textual digital humanities revolve around text, and the really fun stuff involves textual manipulation if not parsing.

Today, let's build a set of tools that will help us create a concordance of a text. We'll have to make a lot of assumptions so that we can see the core pieces, so keep in mind that any real implementation will probably have different details.

We'll assume for now that we have a stream of characters representing the text. We haven't discussed where we get data or where we store it yet. That's for another time. For now, we're focused on what we do with the data between getting and storing it. If we can wrap our minds around what to do with the data, then we can plug-in any data retrieval or storage we want onto our processing later.

Continue Reading Streams, Part II

CAR, CDR, CONS, Oh My!

English: Stream
English: Stream (Photo credit: Wikipedia)

While I am trying to round out the content management aspects of OokOok this year, I'm starting to think ahead to next year's work on databases and processing. Part of the goal is to offer a platform that lets you take advantage of parallel processing without requiring that you be aware that you're doing so. Of course, any such platform will be less powerful than hand-coding parallel code in C or using your favorite Hadoop library. Less powerful is better than not available. I want OokOok to make available capabilities that would otherwise be hidden away.

Map/reduce seem like the simplest way  to think about parallel processing. We have two kinds of operations: those that look at one item at a time (mappings), or those that have to see everything before they can finish their calculation (reductions). Reductions can get by seeing one item at a time if they can keep notes on a scratch pad. We could put operations then into two slightly different camps: those that need a scratch pad (reductions) and those that don't (mappings).

Continue Reading CAR, CDR, CONS, Oh My!