Regular Expressions

I’ve released another gem: fabulator-grammar. This will be developed into a grammar engine modeled loosely on the Perl 6 grammars. Right now, it just provides the ‘g:match($regex, $string)’ function that returns a boolean.

I’m continuing to work on the Donne concordance and needed a way to pick out just the transcription pages from a manuscript so that I could put various views of the manuscript under the same parent page without mistaking them for a transcription. With the ‘g:match()’ function, I can sift them out when building the concordance database (or doing other work with the list of transcription pages).

Right now, the regular expressions support the basics: ( ) for grouping, [ ] for character classes, . for matching any character, and the basic counters +, *, ? and the minimizer ? (after a counter). There’s also support for the anchors ^ (beginning) and $ (end). Most character sequences will match themselves.

The long-term goal is to provide support for building grammars of rules and tokens that can have actions associated with them when they match. This enables parsers in the Fabulator environment without having to break out into Ruby. Since how we parse text can be an important part of an electronic editorial statement, it’s important that we capture this kind of processing in the Fabulator environment instead of hiding it in Ruby.

Ruby Gems Galore

We have gems falling out all over the place. There are still a couple of minor hiccups, but for the most part, these should work now.

In your Radiant config/environment.rb, you can use the following at the bottom to include the Fabulator extension as well as the Exhibit extension to Fabulator:

I’m not sure if that is all that is required to install the gems — I installed them by calling ‘gem install’ manually — but once they are installed, that’s all you need for them to be available in Radiant.

You will want to run something like the following to get the database up to speed:

You can leave off the ‘production’ word if you are wanting to use your development database instead.

The following gems are available now:

  • fabulator – the core state machine and expression engine
  • fabulator-exhibit – the language extensions for managing Exhibit databases
  • fabulator-xml – language extensions for managing XML documents
  • radiant-fabulator-extension – ties the Fabulator engine into Radiant
  • radiant-fabulator_exhibit-extension – ties the Exhibit extension into Radiant

The current fabulator-exhibit gem doesn’t record the dependence on the uuid gem. That should be corrected in the next push.

Mappings, Reductions, and Consolidations

The fabulator-0.0.2 gem will have a slight change in the expression language. We’re also removing a function from the namespace. Right now, I’d say that the tag lib schemas are subject to change until 0.1.0, at least. The 0.0.x releases are early alphas that are just trying to get all of the main points of functionality figured out.

For functions that operate on one type of argument, but possibly an array of that type, we have a way to define their scaling behavior: mappings (one-to-one domain to range), reductions (many-to-one domain to range), and consolidations (many-to-one range to range). Consolidations have the same idempotence requirements as reductions in map/reduce frameworks such as OpenCL.

There’s no difference in how mappings and reductions are invoked in expressions. Simply use their name and supply the data. Consolidations are a little different and are tied explicitly to the reduction that they consolidate.

For a reduction ‘foo’ that takes data from a domain of X to a range of Y, the corresponding consolidation ‘foo*’ will take the result of ‘foo’ (the range of Y) and take it to a range of Y. A consolidation is its own consolidation, though we don’t allow ‘foo**’.

This results in things like ‘f:count*’ == ‘f:sum’ and ‘f:sum*’ == ‘f:sum’ as well. But ‘f:avg*’ doesn’t exist because we lose required information when computing an average (but we can return to consolidations by working with the sums and counts that compose the average).

Anything that was working with histograms will need some changes. Instead of the ‘f:consolidate’ function, we now have the ‘f:histogram*’ function.

The ‘*’ (star) was chosen because it is reminiscent of how some algebras denote closure.

We Have a Gem!

I finally got some stuff cleaned up, tested, and committed to github. This is mainly the template support, but I also made sure all of the code is using the same XML libraries. In this case, the GNOME LibXML suite. It’s fast and common across language platforms.

Fabulator-0.0.1 is now available on rubygems.org. You should be able to install it simply with ‘gem install fabulator’. I’ll be working on the other parts of the system over the next few weeks — namely the XML parsing extension, then Exhibit database extension, and the wedges to make everything work in Radiant.

Templates

I’m working a bit on integrating the Fabulator engine into the Writing and Learning Communities software (source code is on github). WLC manages assignments as sequences of timed modules that manage student interaction. Except for the peer messaging, informational, and rubric modules, a module is a simple state machine with accompanying views and other metadata. This is exactly what the Fabulator engine is designed for. In fact, the WLC software was an early exploration of the idea in Ruby.

Using the Fabulator engine in WLC means we will have two application frameworks with different needs using the same engine and able to share the same extensions. Some framework-specific glue might be needed, but it should be minimal and behind the scenes. An extension that allows a DH project to access a remote resource, for example, will also allow that same resource to be used in designing an assignment.

One thing that the DH side of the shop has that the WLC doesn’t really have is a template engine. Since I envision a custom template system for DH as well someday as I explore the presentation layer design for long-term preservation of projects, I’m going ahead and building a template engine for the Fabulator system. I’ll use this to render the forms for assignment modules.

The logic elements of the template system is based on Radius, the XML-like template system used in Radiant. Right now, we have a element that passes our tests. The element is coded but hasn’t been tested yet.

The complete parsing process has several stages. The initial stage is to execute all of the tags. This is what is done in Radiant. Nothing new in this part. However, we have several additional steps available.

I use a form markup language that is not HTML. This allows me to focus on the logical structure of the form instead of how it is rendered in HTML. For example, the difference between radio buttons and checkboxes is a matter of how many choices can be made at once. The form language reflects this.

Once a template is parsed and run, additional methods allow the setting of default values for form elements (useful for rendering a form when editing an existing piece of information), set the captions for form elements (needed when using a module in an assignment), and set error messages/missing information markers.

After all of this information is added, the form can be output as html. The library uses an XSLT internally since the template is already parsed into a DOM.

A typical use of the template system might be:

The ‘context’ here is a Fabulator::Expr::Context object. This would typically come from the state machine object. As a result, only the namespaces declared in the root element of the state machine definition will be available in any expressions run in the template.

Refactoring Done for Now

The refactoring work for the context management has pretty much been done and checked in. We have it running on three production systems now, though I do need to investigate a possible bug in one of the programs I’ve been working on. The tests aren’t exhaustive.

My next step now will be to release 0.0.1 version gems for the Fabulator engine, the XML extension, and the Exhibit extension (non-Radiant parts). These will be named ruby-fabulator-…. I hope to have these done by next week or when rubygems.org comes back, whichever is later.

The Radiant extensions will be radiant-fabulator-…. The eventual WLC extensions will be wlc-fabulator-…. These are framework-specific glues that make the Fabulator engine and extensions available in the specific frameworks.

Still Refactoring

We’re down to 7 scenarios not completing, and it looks like the problems are in some of the transition management code. All of the tests for the Exhibit extension are passing with the new context system. The XML extension looks like it should pass as well as soon as the 7 get fixed.

I’ve added a new class: Fabulator::Action. This provides basic support for the common use cases where we gather a bit of information at compile time and then do something with it at run time. The class manages the compile time side of things. All you have to do to create a new action is declare your default namespace, a list of attributes you want to handle at compile time, and what you want to do at run time.

For example, from the Exhibit extension, we have the following for adding a value to the set of values we’re building for an item:

We still need to do a better job than the Fabulator::Exhibit::Actions::Lib.add_item_to_accumulator call if we ever want to support multi-threaded execution, but we’re getting there.

Some other examples (without the run time code):

Inherited attributes can be placed on any of the ancestor XML nodes as well as the current node. The closest attribute to the current node wins. This allows us to declare a global default Exhibit database and override it only when we need to. Static attributes are expected to be constant at compile time, so we don’t re-evaluate anything at run-time. Some attributes, such as the name of a view, are static. We try to avoid them if we can because they create inflexibility.

Refactoring Context

There are a few times I’ve needed some compile time information at run time inside a function. The current Fabulator engine doesn’t expose this information to a function definition, which can be problematic when you want a function to automagically use the right database, for example, without having to repeat yourself.

The actual use case that is driving this is building an Exhibit database by accumulating information over time and allowing editing of individual entries. I want to be able to specify the Exhibit database at a global level in the application using an @ex:database and have a function such as ex:item($id) retrieve the indicated item from the indicated database.

There’s no way to do that with the current code on Github, at least not without resorting to ugly globals that would break any kind of threaded approach in the future.

Instead, I’ve spent yesterday afternoon/evening and today ripping out how we handle execution context and slowly moving to a separation between context and data tree. Think of a ribosome walking an RNA strand. In our case, the RNA is a Fabulator::Expr::Context object and the RNA strand is the Fabulator::Expr::Node tree.

There are two types of information we want to access at run time: information available at compile time, and information that depends on what we’ve done at run time. Things like the namespace prefix mappings and ‘global’ attributes are compile time information and don’t change at run time. The current topical node and set of variables are run time scoped and depend on how we got to where we are.

The result is that a Context object that can have two parents: the compile time parent context that corresponds to the enclosing XML element, and the run time parent context that corresponds to the calling action (usually the XML element). The first has information that is constant across executions of the state machine. The second has information peculiar to this particular invocation. Before execution, all contexts only have a compile time parent. When executing, the compile time context is merged with the run time context to produce a new working context. I still need to tease a few things apart, but this should put us on the path to approximately proper support for multi-threaded execution.

Once we have all of our current behavior tests passing (currently at 64/73 scenarios passing), we’ll push our changes to Github and start fixing up the various extensions to use the new context system. This change will break compatibility with all previous versions of the extensions.

This is a major change in the backend, but it feels proper. I think once this is done, we’ll be well on our way to a proper gem cutting. I should be able to have that done by the end of August.

On a side note, I’m moving the regex support into an extension. I didn’t really have any good regex support yet anyway, and I like the way Perl 6 thinks about regexes as grammars with rules and tokens. My plan is to allow grammars to be like libraries that sit outside the code but can be referenced by it.

Radiant 0.9.0

The new version of Radiant was released a week or so ago while I was in Los Angeles. It is the version of Radiant that I have been targeting with the Fabulator extension.

This version of Radiant allows extensions to be loaded from gems, so I plan on releasing the entire set of Fabulator libraries and extensions as gems over the next few weeks. They will be early versions (0.0.x) and somewhat incomplete (some of the functions in the core namespace aren’t implemented), but they should install as gems. This will make installation and updating much easier.

I’m already managing several websites built on the fabulator system. Right now, I have the extensions and gems in the vendor/extensions and vendor/plugins directories for each site. My own life will be made much easier by having the extensions and libraries installed as gems so that I don’t have to update several git clones for each site every time I have an update.

I expect the 0.0.x versions of the gems to be updated fairly often. We’re still going through quite a bit of debugging and development. But now with the gems, updating to the new versions will be trivial.

I’ll post here when the gems are published.

Asset management

The last month has been spent traveling. We presented some aspects of the Fabulator engine at the Digital Humanities Summer Institute with somewhat positive reviews and a lot of questions. There’s still a lot of work to do before it is immediately obvious what the benefits of a system like Radiant+Fabulator. Hopefully we can have the core libraries packaged as gems. Radiant 0.9.0 will be able to use extensions that are installed as gems. That will make installation and management much easier.

The current problem we’re grappling with is how to manage files that are uploaded through a web form. We need this in both a digital humanities context as well as the WLC.

What we’re looking at for now is a library that requires certain functions to be defined by the framework using the engine. Mainly, file saving, loading, removal, and modification.

The extension then provides the following functions:

  • asset:store(tag, context)
  • asset:fetch(tag)
  • asset:remove(tag)
  • asset:rename(old, new)

There is also a global attribute (attr:scope) that defines the scope or namespace for the tags. This allows applications to access any file/asset that is saved, but also lets applications define a particular pool of files that they are focused on.

We also define an asset:asset object type that results in the content of the file when converted to a string. This lets us do lazy loading of file contents. If all you want is the metadata about a file, then you don’t have to worry about the system loading everything into memory first.

File uploads are provided to the engine as asset:asset objects. The metadata is available for constraint checking and filtering. If you need to content, then that will be provided as a string. Files are not saved to any storage unless explicitly done so through a call to an ‘asset:store’ or equivalent. Otherwise, it is cleaned up after any transition is run.