The last month has been spent traveling. We presented some aspects of the Fabulator engine at the Digital Humanities Summer Institute with somewhat positive reviews and a lot of questions. There’s still a lot of work to do before it is immediately obvious what the benefits of a system like Radiant+Fabulator. Hopefully we can have the core libraries packaged as gems. Radiant 0.9.0 will be able to use extensions that are installed as gems. That will make installation and management much easier.
The current problem we’re grappling with is how to manage files that are uploaded through a web form. We need this in both a digital humanities context as well as the WLC.
What we’re looking at for now is a library that requires certain functions to be defined by the framework using the engine. Mainly, file saving, loading, removal, and modification.
The extension then provides the following functions:
- asset:store(tag, context)
- asset:rename(old, new)
There is also a global attribute (attr:scope) that defines the scope or namespace for the tags. This allows applications to access any file/asset that is saved, but also lets applications define a particular pool of files that they are focused on.
We also define an asset:asset object type that results in the content of the file when converted to a string. This lets us do lazy loading of file contents. If all you want is the metadata about a file, then you don’t have to worry about the system loading everything into memory first.
File uploads are provided to the engine as asset:asset objects. The metadata is available for constraint checking and filtering. If you need to content, then that will be provided as a string. Files are not saved to any storage unless explicitly done so through a call to an ‘asset:store’ or equivalent. Otherwise, it is cleaned up after any transition is run.
I’m slowly getting into the presentation side of the Fabulator system. While it’s relatively easy to include data in a page, I didn’t have a good way to produce something more dynamic, such as an Exhibit timeline, until now. While the extensions aren’t quite finished, a first run at the problem is available through the Fabulator Exhibit extension and the Fabulator Exhibit Radiant extension. The first provides the core data management while the latter ties it in with persistant storage and the Radiant environment.
The two together allow us to do the following in an application:
<ex:item ex:id="f:name(.)" ex:label="f:name(.)" ex:type="Word">
<ex:value ex:name="count" f:select="." />
<ex:value ex:name="location" f:select="./path" />
<ex:type ex:name="Word" ex:pluralLabel="Words" />
<ex:property ex:name="count" ex:valueType="numeric" />
The structure should be familiar to anyone who has worked with JSON-encoded Exhibit data sources. It captures all of the relavent information without necessarily imposing the structure of the JSON. You can add items, types, and properties anywhere within the <ex:database/> element.
Speaking of which, I would like to get rid of the <ex:database/> element eventually. It wraps its body so that the Exhibit database is loaded and saved when needed. I’d prefer the framework auto-magically figuring out what needs to be loaded and saved.
The next step is to provide the Radiant page tags that help embed the exhibit in the page with a reference to the exhibit database.
There are two kinds of inheritance that are useful for applications in the Fabulator universe: is-a and has-a. There are also other kinds of inheritance in object oriented programming (OOP): mixins, interfaces, and other ways of tweaking an object class. We aren’t using those in our Fabulator applications though.
The ‘is-a’ inheritance is handled by using a clone of the inherited application to compile the XML description of the inheriting application. Thus, ‘is-a’ inheritance is managed by the framework embedding the Fabulator engine and not by the engine itself. Since the storage of application definitions is defined by the embedding framework, the engine does not try to find the parent application from which it needs to inherit.
The current way to declare inheritance is through the ‘@is-a’ attribute on the ‘application’ root element. With the Fabulator embedded in Radiant, this should point to the path of the page in Radiant that holds the parent application.
The current code (not yet pushed to Github) manages actions and the ability to call the actions in the parent application at will using the new ‘super’ element. This element takes a ‘select’ attribute that provides the initial context for the parent actions. The parent application’s actions are only accessible using an action element. They are not accessible within an expression. This is because of when the needed information about the parent actions is available. If the engine were implemented slightly differently (and this might be the case eventually), then it might be possible to allow access to the parent application’s actions from within expressions.
This kind of inheritance is required before we can create a simple skeleton application for some common digital humanities task and then customize it for a particular project.
Has-a inheritance is a bit trickier and will have to wait until a bit later.
I’ve been making a lot of small bug fixes as I’ve worked through calculating the data for a simple concordance. My current test is with 522 transcriptions (around 20-25 lines each) resulting in a word list with 7677 entries showing how often the word appears and on which pages. It takes about 2.5 minutes to run. I’m focusing on correctness right now. I’ll worry about optimization later.
One of the problems I ran into was the document parser throwing an occasional error. The engine can’t handle that right now, and the result is not always pretty or helpful. With that in mind, I’m adding some structural components that should help.
Any context within which you can have a list of actions will automatically create an exception handling scope. Within that scope, you can use the
ensure element to specify code that should get run even if there is an error (it will run before the exception is handled and any exceptions raised will not be caught by the sibling
catch elements). The
catch elements can have a
@test associated with them that, if true, will cause the enclosed actions to be run. All
catch elements which have a true
@test will be run. The result of the last
catch will be the returned result to the parent action. If no
@test is available, then it will be assumed true. Not sure yet how it will access the exception, but probable through a variable.
There should be an identity action that returns the result of the last contained action. This would make it easier to scope exception handling. I’m not sure what to call that element yet, but it will be there eventually. Candidates include ‘nop’, ‘div’, ‘scope’, ‘identity’.
User management is one of the more difficult things, surprisingly. Authentication methods vary across sites and institutions depending on policies and available security on the server and between the server and browser. Nor are users really part of the digital humanities problem. Usually, the most a DH project needs is to know that not just anyone can modify, add, or remove information.
Even though I don’t have a good idea yet on how to manage users from the point of view of the Fabulator engine (e.g., it shouldn’t depend on the authentication method), I am exposing a little of the Radiant user model by introducing the
current-user function in the Radiant lib namespace. I’ve also added a
go-to action that transitions the application to the stated view. A test can be added that will go to the state view only if the test succeeds.
With these changes, I can add the following snippet as a guard against unauthorized use of an application:
<f:go-to f:test="not(radiant:current-user()/@admin)" f:view="unauthorized" />
With this, I can put into production some simple data management application suites.
I have two projects this semester that are turning out to be more alike than I expected. One has a concordance built from transcripts of manuscripts. The other is putting together a map-based browser for a set of documents.
I’ve already done some work on the concordance front. I can compile a list of words, their frequencies, and which locations they appear in. It’s just a matter of tacking on a browsing interface to the data and I have a fairly full-blown concordance. I also need to finish the transcription browser, but that’s coming along as well built on the AJAX pager plugin to jQuery.
The map-based browser needs to be able to associate places with documents, but more importantly, documents with places. I can also, now, walk through a set of documents with locations tagged, pull out those locations, and attach a latitude and longitude. All I need to be able to do now is take two sets of codings and consolidate them while preserving the document location annotations.
The two processes are nearly identical, so the code should be nearly identical as well. My goal today is to get all of the different pieces in the Fabulator engine environment in place so both processes work as expected. This will likely introduce a new function ‘consolidate’ that does for a collection of nodes what ‘sum’ does with numbers.
Consolidation of two nodes happens when the nodes are considered equal except for any accompanying annotations (attributes or children that are not themselves part of the value of the node as determined by the node type). The two nodes are combined (for example, concordance word nodes have their counts summed) and attributes and children are combined, allowing location annotations to bubble up through the consolidations and remain in the overall concordance or database.
The Geo extension is turning into a general purpose GIS extension. This will provide us with the tools we need for the Digital Concord project.
I’m currently integrating the extension with the GeoRuby gem, which will be a dependency.
In addition to the geo:coding type, I’ll be introducing the geo:point, geo:envelope, and a few other types, along with some operations that make sense for them.
Two geo:envelopes add together to make the smallest envelope that contains both. Likewise, an envelope added to a point should return the smallest envelope containing both. Adding two points can result in a line (but three points should result in an envelope — a line plus a point not on the line should result in an envelope), while multiplying to points can result in an envelope (think cross product of vectors). The difference of two points should result in a distance (in meters, assuming a perfectly spherical earth, which isn’t always the best assumption).
The GeoRuby gem provides tools for managing GIS data in a database, so I’ll be working on how to get the RDF storage engine flexible enough that I can extend it to store and operate on GIS data using the database representation. I consider that an optimization though, so that’s secondary to getting the semantics working right.
That’s today’s work subject to change.
I just created a new project for geo-coding addresses: http://github.com/jgsmith/ruby-fabulator-geo.
Combined with the ability to walk through the TEI documents housed on a Radiant site and execute XPath queries against them, we can extract address information and get the latitude and longitude from Yahoo! Maps (only works for US and Canadian addresses — a limitation in the Yahoo! Maps API).
If we encode an address using:
let $a := geo:coding("some address")
then we can access the information as child nodes of $a:
$a/latitude (: the latitude :)
$a/precision (: the scope of the location: address, city, state, country :)
The string equivalent of $a will be the original address.
I just checked in some changes that show ‘with’ working, at least to some degree. More use cases will help flesh it out.
The ‘with’ keyword in expressions is used to add information to nodes without changing which nodes are being returned by an expression. This is useful for annotating data while passing the data on to another function for further processing. In the context of concordances, this means we can annotate the words and then pass the list of words on to a function that combines the lists into a larger list. This lets us break the concordance process up into smaller steps that can work on a particular line or page of a manuscript. We can attach information to each word as to which line or page it was found on and then retain that information as we combine concordances of pages into concordances of manuscripts.
This lets us do the following:
for $page in donne:document(radiant:find($manuscript)/*)
return for $line in $page/*
with ./location = f:join( ($page/@page, $line/@line), ":")
That should (hopefully) build the concordance data for a particular manuscript (given as the URL in $manuscript). Note that the c:concordance function only compiles a list of word frequencies for the given text. There’s not a lot of other magic going on.
I’m making good progress on the concordance front. I can now do the following:
This will convert the string to a concordance object (implicitly compiling the concordance) and give the the frequency/count of the word “some”.
My goal now is to get the “with ./*/foo := bar” fragment working with an internal Ruby representation of a concordance. This will allow me to annotate the words in a concordance. The internal object already preserves annotations when combining concordances.
Once I have a good serialization format for the concordance, I will be able to persist the concordance in some form — perhaps RDF, but not necessarily so. That combined with annotations of where a word appears will let me do searches of words:
to give me the lines on which a word appears.
to give me the lines on which a word appears beginning with the letter “f”.
At that point, the challenge will be to optimize these idioms so they don’t take forever to run.