All posts by James

James is a software developer and self-published author. He received his B.S. in Math and Physics and his M.A. in English from Texas A&M University. After spending almost two decades in academia, he now works in the Washington, DC, start up world.

Extension functions can be written as expressions

I’m working on some functions that can be useful for a concordance.  Right now, that’s a function to give me the frequency of each word in a text.

I’ve successfully defined it as follows:

This allows functions to be written in terms of previously defined functions, using all of the expressiveness of the Fabulator expression language.

For example, if I defined the XML prefix ‘c’ to correspond to the namespace in which the ‘count-words’ function is defined, then I could run the following expression:

And then ‘$i/of’ would equal 3 because the word ‘of’ appears three times.

Fabulator XML functions on github

I created a github repository for the XML extensions to Fabulator.  This is the first step towards a set of general purpose functions for managing TEI documents.  With this and recent changes to the core Fabulator Radiant extension, we can browse TEI documents in the CMS and extract information from within the document.

These two functions move us much closer to being able to extract geographic information from documents for use in the Digital Concord project.

Fabulator sees Radiant

The Fabulator is a way to build interactive applications within Radiant.  I’m using it as the framework for building several DH projects this semester that work with transcriptions of textual artifacts.  I just pushed a set of changes that allow the fabulator applications to traverse the page hierarchy and access page content (for example, ‘radiant::/@title’ gives the title of the home page).  This means that we can manage TEI documents as pages in the CMS and still process them to extract information for a database, all within the CMS.

Scholarly Software Editions

The NEH and other U.S. federal government agencies are pushing the digital humanities projects to result in something that can be shared. If this is an application that people can use, especially an application that resides on a central server, then the NEH is also wanting provisions for long-term maintenance. Ultimately, digital humanities projects should seek to be a resource that other scholarly work can build on. In this post, I want to explore what this might mean for web-based applications.

Continue Reading Scholarly Software Editions

XML Transformation Creation

I was looking around the web for references about EAD, an XML vocabulary mentioned in a Digital Humanities Working Group meeting Monday. I could see cases where people would want to have documents marked up with both TEI and EAD.

XSLTs basically describe a function that is applied to an XML document resulting in another document (not necessarily XML): D = f(X) where X is a subset of D (for a particular document, I'd say: d = f(x)). We usually are given X and f and asked for D, but I'm wondering if we could be given D and X and find f

This is definitely a pure computer science problem, but it has digital humanities applications. A web search shows some work in this direction, but usually having people manually map elements between the two document sets to generate the XSLT.

Another thing that would come from this is a way to rank XML vocabularies based on their expressive range. If we have two sets of documents (A and B) based on two different XML vocabularies, then if an XSLT exists that maps A -> B, but no XSLT exists that maps B -> A, then the vocabulary for A could be seen as having a larger expressive range than that used for B. That would let us have a more solid foundation for saying that TEI is more expressive than Docbook (which I believe it is, but don't have good data to base that belief on at the moment).

I can manually create XSLTs to go from TEI to Docbook to HTML because I believe there's a loss of information from one format to the next (ignoring the pushing of that information into CSS at the final HTML stage) and because Docbook is a publishing vocabulary and HTML is, with CSS, a de facto typesetting vocabulary. The information isn't so much lost as transformed from semantic to presentation, with the person reading the resulting document adding back the semantic information based on the presentation. The semantic information though is removed from a readily computer-understood form: it's gone from a context-free to a context-dependent form.

Of Fish and Dreams

I've given the novel I'm writing for my thesis the working title, Of Fish and Swimming Swords. I don't have names for the second or third novel yet, but ideas are beginning to come together. They'll complete the arc begun in the thesis.The last two nights, I've woken with farely vivid dreams. Dreams aren't useful in their raw state. If you actually transcribe a dream, it won't make much sense because dream logic isn't sufficiently realistic. But dreams can provide interesting settings and plot pointers. That's what these two dreams have done.

Continue Reading Of Fish and Dreams