Presentations: Manuscript Viewer

As we wrap up the development of the core data transformation engine, we are starting to think about the presentation layer. How can we specify our presentations so that we don’t have to modify them when the JavaScript library APIs, HTML standards, etc., change?

One of the things we’ve worked on is a manuscript viewer that shows an image of a page alongside a transcription. This is a common need in textual digital humanities.

I’m thinking something along the lines of the following based on the work we’ve done for the Donne project (using Radiant to build up an XML description that can be transformed via XSLT):

This is based on the following skeleton that we’re using now:

Using the XML description frees us to use a different implementation without having to revisit the project as long as we are using the same semantics. We’ll be looking to a few more projects to figure out some patterns before we start making design decisions, but this is a good indication of how we’re planning on approaching the presentation layer of DH projects.

Junctions

One of the fundamental concepts I’ve been working with in the Fabulator expression engine is that a set of items should be as easy to work with as a single item. However, if we know (or think) we are working with sets of items, then we can start asking different questions as soon as that set has more than one element.

Junctions are variables that act as if they are set to multiple values at the same time. The default junction (and the only one supported for now in the engine) is ‘any()’.

Any

Instead of saying something like ‘/a[. = 1 or . = 2 or . = 3]’, a junction allows ‘/a[. = any(1,2,3)]’, which is more readable and maintainable.

You could also do something like ‘/a/*[f:name(.) = any($list-of-names)]’, though I might like ‘/a/{any($list-of-names)}’ better.

Given two sets of values, you could do comparisons such as ‘any($list-a) > any($list-b)’ to see if there’s a value in $list-a that is larger than a value in $list-b.

All

Another junction is ‘all()’. It is used to ensure that every value meets any requirement in a comparison. For example, ‘all($list-a) > any($list-b)’ would mean that every value in $list-a had to exceed a value in $list-b (possible a different value in $list-b for each value in $list-a). ‘all($list-a) > all($list-b)’ would be true only if the minimum value of $list-a exceeded the maximum value in $list-b.

The construct ‘/a/{all($list-of-names)}’ wouldn’t be very useful unless $list-of-names was single-valued.

None

The ‘none()’ junction is a bit of the opposite of ‘all()’. Instead of requiring complete agreement from all values, it requires complete non-agreement.

The expression ‘none($list-a) > none($list-b)’ would mean that no value from $list-a could exceed a value not in $list-b. Not very useful given the number of negative numbers that are less than the minimum value that could be in $list-b. Replacing one of them with an ‘all()’ or ‘any()’ would work though. ‘none($list-a) > any($list-b)’ would be true if no value in $list-a exceeded a value in $list-b (equivalent to ‘all($list-a) <= all($list-b)’).

The path ‘/a/{none($list-of-names)}’ would select all of the children of ‘/a’ that didn’t have a name in the $list-of-names.


We’re still figuring out how these will be worked into the language. They might be limited to use within predicates and tests or any ephemeral run-time processing. This is the beginning of the thinking out loud design phase.

Moving Towards Stability

I’ve been working with a rapid release cycle, but I think we’re getting close to a stable core that can support new work soon.

The current list of Ruby gems:

  • fabulator 0.0.5
  • fabulator-exhibit 0.0.2
  • fabulator-grammar 0.0.1
  • fabulator-xml 0.0.3
  • radiant-fabulator-extension 0.0.3
  • radiant-fabulator_exhibit-extension 0.0.1

With this set, you can make a mistake in the XML defining an application and Radiant will report a validation error and give you an opportunity to edit the page again. Before, it would just throw an error and give you an otherwise blank page with no options other than hitting the back button.

Predicates should be a bit more DWIMy (Do What I Mean). If the expression evaluates to a number, then it will be a match against the position in the set. If it’s a string, then it will be true if non-blank. If it’s boolean, then it’s just boolean. The f:position(), f:first(), and f:last() functions should now work in predicates. We will be working to extend it to other iterative contexts.

The next week or two should be continued bug fixes as we come to the close of the project season and launch into the next. We’ll post more later on what might come starting in September.

We’re also working on developing some web-based resources that will explain how to use all of these libraries to build a project site.

Regular Expressions

I’ve released another gem: fabulator-grammar. This will be developed into a grammar engine modeled loosely on the Perl 6 grammars. Right now, it just provides the ‘g:match($regex, $string)’ function that returns a boolean.

I’m continuing to work on the Donne concordance and needed a way to pick out just the transcription pages from a manuscript so that I could put various views of the manuscript under the same parent page without mistaking them for a transcription. With the ‘g:match()’ function, I can sift them out when building the concordance database (or doing other work with the list of transcription pages).

Right now, the regular expressions support the basics: ( ) for grouping, [ ] for character classes, . for matching any character, and the basic counters +, *, ? and the minimizer ? (after a counter). There’s also support for the anchors ^ (beginning) and $ (end). Most character sequences will match themselves.

The long-term goal is to provide support for building grammars of rules and tokens that can have actions associated with them when they match. This enables parsers in the Fabulator environment without having to break out into Ruby. Since how we parse text can be an important part of an electronic editorial statement, it’s important that we capture this kind of processing in the Fabulator environment instead of hiding it in Ruby.

Ruby Gems Galore

We have gems falling out all over the place. There are still a couple of minor hiccups, but for the most part, these should work now.

In your Radiant config/environment.rb, you can use the following at the bottom to include the Fabulator extension as well as the Exhibit extension to Fabulator:

I’m not sure if that is all that is required to install the gems — I installed them by calling ‘gem install’ manually — but once they are installed, that’s all you need for them to be available in Radiant.

You will want to run something like the following to get the database up to speed:

You can leave off the ‘production’ word if you are wanting to use your development database instead.

The following gems are available now:

  • fabulator – the core state machine and expression engine
  • fabulator-exhibit – the language extensions for managing Exhibit databases
  • fabulator-xml – language extensions for managing XML documents
  • radiant-fabulator-extension – ties the Fabulator engine into Radiant
  • radiant-fabulator_exhibit-extension – ties the Exhibit extension into Radiant

The current fabulator-exhibit gem doesn’t record the dependence on the uuid gem. That should be corrected in the next push.

Mappings, Reductions, and Consolidations

The fabulator-0.0.2 gem will have a slight change in the expression language. We’re also removing a function from the namespace. Right now, I’d say that the tag lib schemas are subject to change until 0.1.0, at least. The 0.0.x releases are early alphas that are just trying to get all of the main points of functionality figured out.

For functions that operate on one type of argument, but possibly an array of that type, we have a way to define their scaling behavior: mappings (one-to-one domain to range), reductions (many-to-one domain to range), and consolidations (many-to-one range to range). Consolidations have the same idempotence requirements as reductions in map/reduce frameworks such as OpenCL.

There’s no difference in how mappings and reductions are invoked in expressions. Simply use their name and supply the data. Consolidations are a little different and are tied explicitly to the reduction that they consolidate.

For a reduction ‘foo’ that takes data from a domain of X to a range of Y, the corresponding consolidation ‘foo*’ will take the result of ‘foo’ (the range of Y) and take it to a range of Y. A consolidation is its own consolidation, though we don’t allow ‘foo**’.

This results in things like ‘f:count*’ == ‘f:sum’ and ‘f:sum*’ == ‘f:sum’ as well. But ‘f:avg*’ doesn’t exist because we lose required information when computing an average (but we can return to consolidations by working with the sums and counts that compose the average).

Anything that was working with histograms will need some changes. Instead of the ‘f:consolidate’ function, we now have the ‘f:histogram*’ function.

The ‘*’ (star) was chosen because it is reminiscent of how some algebras denote closure.

We Have a Gem!

I finally got some stuff cleaned up, tested, and committed to github. This is mainly the template support, but I also made sure all of the code is using the same XML libraries. In this case, the GNOME LibXML suite. It’s fast and common across language platforms.

Fabulator-0.0.1 is now available on rubygems.org. You should be able to install it simply with ‘gem install fabulator’. I’ll be working on the other parts of the system over the next few weeks — namely the XML parsing extension, then Exhibit database extension, and the wedges to make everything work in Radiant.

Templates

I’m working a bit on integrating the Fabulator engine into the Writing and Learning Communities software (source code is on github). WLC manages assignments as sequences of timed modules that manage student interaction. Except for the peer messaging, informational, and rubric modules, a module is a simple state machine with accompanying views and other metadata. This is exactly what the Fabulator engine is designed for. In fact, the WLC software was an early exploration of the idea in Ruby.

Using the Fabulator engine in WLC means we will have two application frameworks with different needs using the same engine and able to share the same extensions. Some framework-specific glue might be needed, but it should be minimal and behind the scenes. An extension that allows a DH project to access a remote resource, for example, will also allow that same resource to be used in designing an assignment.

One thing that the DH side of the shop has that the WLC doesn’t really have is a template engine. Since I envision a custom template system for DH as well someday as I explore the presentation layer design for long-term preservation of projects, I’m going ahead and building a template engine for the Fabulator system. I’ll use this to render the forms for assignment modules.

The logic elements of the template system is based on Radius, the XML-like template system used in Radiant. Right now, we have a element that passes our tests. The element is coded but hasn’t been tested yet.

The complete parsing process has several stages. The initial stage is to execute all of the tags. This is what is done in Radiant. Nothing new in this part. However, we have several additional steps available.

I use a form markup language that is not HTML. This allows me to focus on the logical structure of the form instead of how it is rendered in HTML. For example, the difference between radio buttons and checkboxes is a matter of how many choices can be made at once. The form language reflects this.

Once a template is parsed and run, additional methods allow the setting of default values for form elements (useful for rendering a form when editing an existing piece of information), set the captions for form elements (needed when using a module in an assignment), and set error messages/missing information markers.

After all of this information is added, the form can be output as html. The library uses an XSLT internally since the template is already parsed into a DOM.

A typical use of the template system might be:

The ‘context’ here is a Fabulator::Expr::Context object. This would typically come from the state machine object. As a result, only the namespaces declared in the root element of the state machine definition will be available in any expressions run in the template.

Refactoring Done for Now

The refactoring work for the context management has pretty much been done and checked in. We have it running on three production systems now, though I do need to investigate a possible bug in one of the programs I’ve been working on. The tests aren’t exhaustive.

My next step now will be to release 0.0.1 version gems for the Fabulator engine, the XML extension, and the Exhibit extension (non-Radiant parts). These will be named ruby-fabulator-…. I hope to have these done by next week or when rubygems.org comes back, whichever is later.

The Radiant extensions will be radiant-fabulator-…. The eventual WLC extensions will be wlc-fabulator-…. These are framework-specific glues that make the Fabulator engine and extensions available in the specific frameworks.

Still Refactoring

We’re down to 7 scenarios not completing, and it looks like the problems are in some of the transition management code. All of the tests for the Exhibit extension are passing with the new context system. The XML extension looks like it should pass as well as soon as the 7 get fixed.

I’ve added a new class: Fabulator::Action. This provides basic support for the common use cases where we gather a bit of information at compile time and then do something with it at run time. The class manages the compile time side of things. All you have to do to create a new action is declare your default namespace, a list of attributes you want to handle at compile time, and what you want to do at run time.

For example, from the Exhibit extension, we have the following for adding a value to the set of values we’re building for an item:

We still need to do a better job than the Fabulator::Exhibit::Actions::Lib.add_item_to_accumulator call if we ever want to support multi-threaded execution, but we’re getting there.

Some other examples (without the run time code):

Inherited attributes can be placed on any of the ancestor XML nodes as well as the current node. The closest attribute to the current node wins. This allows us to declare a global default Exhibit database and override it only when we need to. Static attributes are expected to be constant at compile time, so we don’t re-evaluate anything at run-time. Some attributes, such as the name of a view, are static. We try to avoid them if we can because they create inflexibility.