In the last post (back in 2012—this post has been in draft for some time), we talked a bit about streams. Eventually, I want to see how we might use streams to write a parser that can parse the language we're developing to talk about streams, but let's start with something a little simpler. The textual digital humanities revolve around text, and the really fun stuff involves textual manipulation if not parsing.
Today, let's build a set of tools that will help us create a concordance of a text. We'll have to make a lot of assumptions so that we can see the core pieces, so keep in mind that any real implementation will probably have different details.
We'll assume for now that we have a stream of characters representing the text. We haven't discussed where we get data or where we store it yet. That's for another time. For now, we're focused on what we do with the data between getting and storing it. If we can wrap our minds around what to do with the data, then we can plug-in any data retrieval or storage we want onto our processing later.