Category Archives: Fabulator

Page creation

February 12, 2012 James

I just pushed Fabulator 0.0.12 and Radiant Fabulator Extension 0.0.9 to RubyGems. The first adds the template element to libraries. The latter adds a page and page-part action for creating pages in Radiant. This will eventually enable editing of existing pages, but for now it’s aimed at creating new ones. That’s the functionality we need, so that’s what we wrote.

Continue Reading Page creation

Fabulator

Templates

February 12, 2012 James

Templates mean a lot of things. This time, it’s building up strings within the Fabulator engine instead of building strings in the client presentation.

I’ve not checked them into Github yet, but I’ve coded a fifth type of function definition in Fabulator libraries: the template type. They work like this:

Continue Reading Templates

Fabulator

Presentation Layout

February 9, 2012 James

I’m working on a set of presentation capabilities inspired by MIT’s Exhibit widget set. A lot of the data management core is identical, but I’m trying to play around with the presentation and interaction a bit more. My goal is to have something sufficiently flexible that I can implement a simple game with it.

The standard Exhibit practice if you have multiple views into a data set is to show the views with what is essentially a tabbed interface. You can select which view you want, but you usually don’t get both at the same time.

I’m working on a second way of doing presentations of multiple views: an arranged set of views in a layer. Additional views might be done as windows above the layer, but the primary working set of views will be in a layer.

I could make a system in which the author would have to lay out each view by hand specifying heights and widths and placements on a grid, but that strikes me as being too close to the typesetting and too far from the idea. I want the author to specify a general idea of what they want and then have the system do a reasonably good job of presenting that idea.

What I’m playing with at the moment is a system by which the author specifies how many units they want the presentation to have for height and width. This is an arbitrary number that is used to scale other measurements so they fit in the display space. This lets the system work with computer displays, printed pages, or other devices that might not work in pixels or have the same pixel density.

Then, for each widget in the layout, the author specifies the stretchability and compressibility of bits of glue that tie the widget to other widgets and the frame. Glue is also used to specify the sizing of the inside part of the widget. The system can work with the glues to find a reasonable set of heights, widths, and distances to make everything reasonably visible. This models a bit of how (La)TeX manages text on a page.

The benefit of this system is that users can not only resize elements if they need to and everything will respond, but they can add new widgets in the layout and everything will automatically reshuffle to accomodate the new widget.

I’m still working on a simple implementation that will let me test this. I think it’s a reasonable first pass at the problem. I know I’m using the word ‘reasonable’ a lot, but I’m not going for perfect. I’m going for something that is good enough, at least for now.

My initial algorithm will try to lay out the widgets as points on a frame using the glue relationships to determine relative ordering. Once laid out, I’ll use the tensions in the glue to correct the centers. Then, I can do some slight adjustments to get the widget frames in place. It isn’t quite as simple as taking the distance between centers: the tension is between edges, so we take vertical or horizontal distance, but not diagonal distance.

Fabulator

Presentation Management

February 9, 2012 James

Fabulator 0.0.9 is going to be full of changes. Some are relatively minor internal changes that only matter to people who have written tag libraries (i.e., me). The biggest change that will have an impact for you, the user, is the presentation layer management.

XML is at the heart of the Fabulator system. Applications are written in XML, libraries are written in XML. The views are written in whatever markup is comfortable, but the interactive elements (the forms) are easiest in XML.

Tag libraries will have two ways they can influence the system beyond defining types, functions, and actions. Tag libraries will be able to specify two different XSLTs: one that can be applied to the application to transform it, and one that can be applied in a presentation context to provide new presentation elements. The first can be used as a macro language for the applications. The second can be thought of as similar for presentations.

I’m considering using Fluid Infusion as the foundational JavaScript library for presentation. I’m not settled yet on which gem will get it, though it might be easiest to bundle it with the Radiant extension.

To add presentation transformations, you do something like the following in your Ruby tag library. This example is taken from the core action library.

presentations do
  transformations_into.html do
    xslt_from_file File.join(File.dirname(__FILE__), "..", "..", "..", "xslt", "form.xsl")
  end

  interactive :text
  interactive :asset
  interactive :selection
  interactive :password
  interactive :submission

  structural :form
  structural :container
  structural :group
  structural :option
end

presentations do

transformations_into.html do

xslt_from_file File.join(File.dirname(__FILE__), "..", "..", "..", "xslt", "form.xsl")

end

interactive :text

interactive :asset

interactive :selection

interactive :password

interactive :submission

structural :form

structural :container

structural :group

structural :option

end

Now the system knows which XSLT file to use when transforming the markup to HTML, it knows which elements in the tag library’s namespace should automatically be given default values and captions, and it knows which elements contribute their ‘id’ attribute to the path pointing to the default value in the application’s memory (and thus contribute to the name of the HTML input element as well).

You can have, for example, the following form in Radiant:

<r:form base="account">
  <text id="name"><caption>Name</caption></text>
  <text id="email"><caption>Email</caption></text>
  <submission id="go"><caption>Submit</caption></submission>
</r:form>

<r:form base="account">

<text id="email"><caption>Email</caption></text>

<submission id="go"><caption>Submit</caption></submission>

</r:form>

The name and email fields will be populated by the data in ‘/account/name’ and ‘/account/email’ if they exist. The HTML input fields will have the names ‘account.name’ and ‘account.email’.

The Radiant tag (<r:form />) automatically surrounds the enclosed XML in a <form xmlns=”http://…” /> element with the appropriate namespace declaration to allow simple form markup without prefixes.

Because different tag libraries may have structural elements that contribute to the path, the system will also inject a ‘path’ attribute into interactive elements so they know how to name the HTML elements. The tag library doesn’t have to know what other tag libraries are contributing.

It makes sense for one library to use another, so the XSLT associated with one library will be run before the XSLT associated with a library associated with a namespace mentioned in the library’s root element (for libraries defined in XML), or in the root element of the XSLT (for libraries defined in Ruby). This is for presentation and application XML transformations.

This is the start of a push away from explicit HTML in projects. We’ve done a lot to make the algorithms transportable through time — we don’t have to revisit projects to update code when we update the framework (or won’t once we hit 1.0), but there’s more to a project than just the algorithms. There’s also the HTML and CSS that tends to grow stale over time. By creating a presentation layer, we can capture the presentation intent of the project while providing a (hopefully) robust path forward.

Ultimately, a project may be able to expose the presentation and algorithm layers to advanced researchers and let them create their own algorithmic presentations built from humanities data. That’s a number of years off, but that’s the direction we’re headed.

Next on the todo list: come up with a reasonable set of presentation tags for building Exhibit displays. For those thinking this looks similar to things like Cacoon, you’re not too far off from the general idea as far as the implementation goes. Some details are a bit different, of course.

Fabulator

Workflows

February 9, 2012 James

I’ve been working a little here and there on getting a workflow extension together. I still have a bit to do, but the skeleton is shaping up and passing some simple tests.

The plan right now is for workflows to act like grammars in that they augment a library instead of standing alone. Workflow definitions will attach themselves to the namespace of a library and have a name that can be used (when qualified by the library’s namespace) to reference the workflow definition.

Workflows are very similar to applications in that they are a collection of states and state transitions. The important differences are that you can add data over time to the workflow and test which transitions (actions) can be taken at any given time without automatically following a transition.

This allows an application to add data to a workflow and give feedback to the user as to which actions can be taken on the workflow. For example, I’m considering building a workflow that can manage the process from proposal submission to final approval. Such a system would allow anyone to upload a project proposal to a queue (resulting in a workflow in the ‘pending’ state), allow the advisory committee to review pending proposals and possibly approve them (moving them to the ‘approved’ state) or request more information (moving them to the ‘more-info-needed’ state).

The resulting workflow might look something like the following:

<w:workflow xmlns:w="http://dh.tamu.edu/ns/fabulator/workflow/1.0#"
    xmlns:f="http://dh.tamu.edu/ns/fabulator/1.0#"
>
  <w:state w:name="start">
    <w:action w:name="submit" w:goes-to="pending">
      <f:params>
        <f:param f:name="proposal" f:required="yes" />
      </f:params>
    </w:action>
  </w:state>
  <w:state w:name="pending">
    <w:action w:name="approve" w:goes-to="approved" />
    <w:action w:name="request-more-info" w:goes-to="more-info-needed" />
  </w:state>
  <w:state w:name="approved" />
  <w:state w:name="more-info-needed" />
</w:workflow>

<w:workflow xmlns:w="http://dh.tamu.edu/ns/fabulator/workflow/1.0#"

xmlns:f="http://dh.tamu.edu/ns/fabulator/1.0#"

<w:state w:name="start">

<w:action w:name="submit" w:goes-to="pending">

<f:params>

<f:param f:name="proposal" f:required="yes" />

</f:params>

</w:action>

</w:state>

<w:state w:name="pending">

<w:action w:name="approve" w:goes-to="approved" />

<w:action w:name="request-more-info" w:goes-to="more-info-needed" />

</w:state>

<w:state w:name="approved" />

<w:state w:name="more-info-needed" />

</w:workflow>

We still need ways to manage authentication and authorization. The final version of this will also require some framework-specific code to manage workflow storage. There’s a lot still to be figured out to try and get this close to right, but we’re getting there.

Fabulator

Gems updated

February 9, 2012 James

I went ahead and pushed out new gems for fabulator, fabulator-grammar, and radiant-fabulator-extension. The last requires a database migration run to add the library table.

This update brings to Radiant the ability to define libraries that can be referenced from applications through the namespace declarations.

I’m starting to put together some project management stuff using the fabulator setup in Radiant. I’ll use that to flesh out the next set of updates to the gems. Once I have something workable, I’ll see where I can post the XML for the libraries and some of the pages as examples of how to use the Fabulator+Radiant system.

We’re still bootstrapping, but we’re far enough along now that we can start doing meatier projects.

Fabulator

Libraries, some more

February 9, 2012 James

I think we’re getting very close to having libraries sufficiently developed that we can return to focusing on actual projects. Upcoming gem releases (tomorrow or early next week) will have a good start for building libraries and grammars. A lot of work remains, but there’s enough there that we can get real work done now.

Our current play library is the following:

<l:library
           xmlns:l="http://dh.tamu.edu/ns/fabulator/library/1.0#"
           xmlns:g="http://dh.tamu.edu/ns/fabulator/grammar/1.0#"
           xmlns:f="http://dh.tamu.edu/ns/fabulator/1.0#"
           l:ns="http://example.com/ns/grammar"
>
  <g:grammar>
    <g:context g:mode="normal">
      <g:token g:name="LETTER" g:matches="[:alpha:]" />
    </g:context>
    <g:token g:name="NUMBER" g:matches="[:digit:]" />
    <g:token g:name="LETTER" g:matches="[:upper:]" g:mode="upper"/>
    <g:token g:name="LETTER" g:matches="[:lower:]" g:mode="lower"/>
    <g:rule g:name="something">
      <g:when g:matches="[mode normal] LETTER NUMBER [mode upper] LETTER" />
    </g:rule>
    <g:rule g:name="something2">
      <g:when g:matches="[mode normal] LETTER NUMBER [mode upper] LETTER">
        <g:result g:path="foo" f:select="NUMBER" />
      </g:when>
    </g:rule>
  </g:grammar>
  <l:mapping l:name="double">
    <f:value-of f:select=". * 2" />
  </l:mapping>
  <l:function l:name="fctn">
    <f:value-of f:select="$1 - $2" />
  </l:function>
  <l:action l:name="actn" l:has-actions="true">
    <f:value-of f:select="f:eval($actions) * 3" />
  </l:action>
</l:library>

<l:library

xmlns:l="http://dh.tamu.edu/ns/fabulator/library/1.0#"

xmlns:g="http://dh.tamu.edu/ns/fabulator/grammar/1.0#"

xmlns:f="http://dh.tamu.edu/ns/fabulator/1.0#"

l:ns="http://example.com/ns/grammar"

<g:grammar>

<g:context g:mode="normal">

<g:token g:name="LETTER" g:matches="[:alpha:]" />

</g:context>

<g:token g:name="NUMBER" g:matches="[:digit:]" />

<g:token g:name="LETTER" g:matches="[:upper:]" g:mode="upper"/>

<g:token g:name="LETTER" g:matches="[:lower:]" g:mode="lower"/>

<g:rule g:name="something">

<g:when g:matches="[mode normal] LETTER NUMBER [mode upper] LETTER" />

</g:rule>

<g:rule g:name="something2">

<g:when g:matches="[mode normal] LETTER NUMBER [mode upper] LETTER">

<g:result g:path="foo" f:select="NUMBER" />

</g:when>

</g:rule>

</g:grammar>

<l:mapping l:name="double">

<f:value-of f:select=". * 2" />

</l:mapping>

<l:function l:name="fctn">

<f:value-of f:select="$1 - $2" />

</l:function>

<l:action l:name="actn" l:has-actions="true">

<f:value-of f:select="f:eval($actions) * 3" />

</l:action>

</l:library>

In an application (assuming the ‘m’ prefix maps to ‘http://example.com/ns/grammar’), we can use the grammar rules ‘something’ and ‘something2’ as filters and constraints. We can also use them as functions for matching and parsing strings.

We also have the functions ‘m:double’ (a mapping) and ‘m:fctn’ (a regular function).

Finally, we have the action ‘m:actn’ which accepts enclosed elements as actions and returns thrice the value returned by running the actions. The ‘f:eval’ function evaluates the referenced code using the current node in the context.

In general, mappings will be called with their argument as the current node. Functions will be called with each argument as a separate variable ($1, $2, $3, …) and with $0 representing all of the arguments as one list. Reductions will be called with a long list in $0.

We still have a few more things we need to do to make these XML-based libraries as flexible as the Ruby ones, but that will come as we need that flexibility.

We also need to think about how we want to document libraries… but enough for one day.

Fabulator

Libraries and Grammars

February 9, 2012 James

We’ve released a new set of Fabulator gems. These give us the better ways of defining structural elements and implementing classes. We’ll be building on these in the next releases to provide a general library scheme in XML.

The goal is to allow the creation of extension tag libraries in XML without having to write Ruby code. We want to reserve Ruby (or whatever lower-level language is used to build the enabling framework) for those extensions that can not be written given existing extensions and language capabilities.

The library concept is part of the core Fabulator gem. It provides a separate namespace for the library (http://dh.tamu.edu/ns/fabulator/library/1.0#). With a few new class methods that will be in the next release of Fabulator (0.0.8), the grammar extension is able to place itself within the library container:

<l:library 
      xmlns:l="http://dh.tamu.edu/ns/fabulator/library/1.0#"
      xmlns:g="http://dh.tamu.edu/ns/fabulator/grammar/1.0#"
      l:ns="http://example.com/ns/grammar"
>
  <g:grammar>
    <g:context g:mode="normal">
      <g:token g:name="LETTER" g:matches="[:alpha:]" />
    </g:context>
    <g:token g:name="NUMBER" g:matches="[:digit:]" />
    <g:token g:name="LETTER" g:matches="[:upper:]" g:mode="upper"/>
    <g:token g:name="LETTER" g:matches="[:lower:]" g:mode="lower"/>
    <g:rule g:name="something"> 
      <g:when g:matches="^^ [mode normal] LETTER NUMBER [mode upper] LETTER" />   
    </g:rule>
  </g:grammar>
</l:library>

<l:library

xmlns:l="http://dh.tamu.edu/ns/fabulator/library/1.0#"

xmlns:g="http://dh.tamu.edu/ns/fabulator/grammar/1.0#"

l:ns="http://example.com/ns/grammar"

<g:grammar>

<g:context g:mode="normal">

<g:token g:name="LETTER" g:matches="[:alpha:]" />

</g:context>

<g:token g:name="NUMBER" g:matches="[:digit:]" />

<g:token g:name="LETTER" g:matches="[:upper:]" g:mode="upper"/>

<g:token g:name="LETTER" g:matches="[:lower:]" g:mode="lower"/>

<g:rule g:name="something">

<g:when g:matches="^^ [mode normal] LETTER NUMBER [mode upper] LETTER" />

</g:rule>

</g:grammar>

</l:library>

If we map the ‘m’ prefix to the ‘http://example.com/ns/grammar’ namespace (corresponding to the l:ns attribute value), then we can run ‘m:something?(‘a0A’)’ and get back ‘true’. We can also run ‘m:something(‘a0A’)/NUMBER’ and get back the string ‘0’ (not the integer 0 though). We won’t get anything though when we run at against the string ‘a0a’ because the second ‘a’ doesn’t match the LETTER token in the ‘upper’ mode.

Our next step is to provide a mechanism in the Radiant Fabulator extension for managing libraries. Hopefully we can have that by the end of this week. We’ll release all of the relevant gems at that point.

Having a library capability and the grammar extension will allow us to put in one location all of the text patterns that we use in a project. Regular expressions won’t be magic values embedded in programs.

Fabulator

Grammars and Structures

February 9, 2012 James

We’ve been busy getting a couple of different things done. These will show up in the next releases of the core Fabulator gem and the grammar gem.

Grammars

The following test grammar works.

    <g:grammar xmlns:g="http://dh.tamu.edu/ns/fabulator/grammar/1.0#">
      <g:token g:name="LETTER" g:matches="[:alpha:]" />
      <g:token g:name="NUMBER" g:matches="[:digit:]" />
      <g:rule g:name="something">   
        <g:when g:matches="LETTER NUMBER LETTER" />
      </g:rule>
      <g:rule g:name="other">
        <g:when g:matches="a := LETTER b := NUMBER c := LETTER" />
      </g:rule>
      <g:rule g:name="or">   
        <g:when g:matches="other(s) d := LETTER(s)" />
      </g:rule>
      <g:rule g:name="ooor">
        <g:when g:matches="other(s ',') d := LETTER(s)" />
      </g:rule>
    </g:grammar>

<g:grammar xmlns:g="http://dh.tamu.edu/ns/fabulator/grammar/1.0#">

<g:token g:name="LETTER" g:matches="[:alpha:]" />

<g:token g:name="NUMBER" g:matches="[:digit:]" />

<g:rule g:name="something">

<g:when g:matches="LETTER NUMBER LETTER" />

</g:rule>

<g:rule g:name="other">

<g:when g:matches="a := LETTER b := NUMBER c := LETTER" />

</g:rule>

<g:rule g:name="or">

<g:when g:matches="other(s) d := LETTER(s)" />

</g:rule>

<g:rule g:name="ooor">

<g:when g:matches="other(s ',') d := LETTER(s)" />

</g:rule>

</g:grammar>

This is a little different than what I mentioned in the previous post. Tokens are the same, but the rules are very different. They borrow some ideas from the Perl Parse::RecDescent module with a little Perl 6 thrown in.

None of the rules are anchored, so they will skip any leading text until they find something that matches.

The ‘something’ rule will match any sequence of letter-number-letter and return a structure representing the matches. For example, matching ‘a1c’ will result in ./LETTER equalling ‘a’ and ‘c’ and ./NUMBER equalling ‘1’.

The ‘other’ rule will match the same sequence as the ‘something’ rule, but instead of naming the resulting values LETTER and NUMBER, it will use ‘a’, ‘b’, ‘c’ as the names (the := construct). In the case of matching against ‘a1c’, the result is ./a = ‘a’, ./b = ‘1’, ./c = ‘c’.

Things start getting interesting with the ‘or’ rule. This will match any sequence of one or more ‘other’ rules followed by one or more LETTER tokens.

Finally, the ‘ooor’ rule will match one or more ‘other’ rules that are separated by commas followed by one or more LETTER tokens.

We’ll start getting modes and additional quantifiers working next.

Structure

We’ve had the Fabulator::Action class for a while for defining new actions, but structural elements have still been hand-coded with a lot of duplicated code. Today’s work changes that.

The first thing is that in the action lib class that defines all of the components of the library (and we’ll probably be changing ‘ActionLib’ to ‘TagLib’), you specify that an element is structural like so (borrowed from the core library):

structural 'application', Fabulator::Core::StateMachine
structural 'view', Fabulator::Core::State
structural 'goes-to', Fabulator::Core::Transition
structural 'params', Fabulator::Core::Group
structural 'group', Fabulator::Core::Group
structural 'param', Fabulator::Core::Parameter
structural 'value', Fabulator::Core::Constraint
structural 'constraint', Fabulator::Core::Constraint
structural 'filter', Fabulator::Core::Filter
structural 'sort', Sort
structural 'when', When
structural 'otherwise', When

structural 'application', Fabulator::Core::StateMachine

structural 'view', Fabulator::Core::State

structural 'goes-to', Fabulator::Core::Transition

structural 'params', Fabulator::Core::Group

structural 'group', Fabulator::Core::Group

structural 'param', Fabulator::Core::Parameter

structural 'value', Fabulator::Core::Constraint

structural 'constraint', Fabulator::Core::Constraint

structural 'filter', Fabulator::Core::Filter

structural 'sort', Sort

structural 'when', When

structural 'otherwise', When

Each of these can be used in an action or structural class as a structural element instead of an action.

For example, the view element class is defined as follows (minus some run-time stuff):

class State < Fabulator::Structural
  attr_accessor :name, :transitions

  namespace Fabulator::FAB_NS

  attribute :name, :static => true

  contains 'goes-to', :as => :transitions
end

class State < Fabulator::Structural

attr_accessor :name, :transitions

namespace Fabulator::FAB_NS

attribute :name, :static => true

contains 'goes-to', :as => :transitions

end

Parameters (the ‘param’ element):

class Parameter < Fabulator::Structural
  attr_accessor :name

  namespace Fabulator::FAB_NS

  attribute :name, :eval => false, :static => true
  attribute :required, :static => true, :default => 'false'

  contains :constraint
  contains :filter
  contains :value, :as => :constraints
end

class Parameter < Fabulator::Structural

attr_accessor :name

namespace Fabulator::FAB_NS

attribute :name, :eval => false, :static => true

attribute :required, :static => true, :default => 'false'

contains :constraint

contains :filter

contains :value, :as => :constraints

end

By default, the contained structural elements are stored in an instance variable named after the element, but pluralized (e.g., ‘constraint’ elements will be stored in @constraints). By default, this is an array. It can be a hash if the ‘:storage => :hash’ option is set. If no ‘:name => :method’ (for whichever method is appropriate) option is given, it defaults to the ‘name’ method to determine the key for the hash. If two structural elements are assigned to the same variable, they are combined instead of one overwriting the other.

This gets us a little closer to being able to have libraries or packages as useful concepts in the Fabulator+Radiant system.

Fabulator

More Regex Work

February 9, 2012 James

One of the overall guiding principles of my work is that anything that can be considered an editorial statement within the context of a particular DH project should be exposed to the project owner instead of being hidden away in Ruby code (or any other language of the day). The Fabulator+Radiant combination allows the researcher to see in the content management system everything that pertains to their project. Any tweaks in how they interpret their data are done within the CMS, not within Ruby code.

With that in mind, I’m slowly making progress on a general grammar engine that will let us write markup parsers within the CMS. This is useful when we are working with markup that was developed before TEI, such as in the Donne project. Right now, the parsing of the transcriptions is done in Ruby, so any modification in the parsing and possible translation to HTML or TEI is done in Ruby, away from the eyes of the people with an editorial interest in that translation.

First, I’ll mention the limitations of the Ruby 1.8 regex engine.

It doesn’t really get Unicode, or if it does, Ruby integers don’t. I can only create characters in the range of 0x00-0xff. Thus, the initial grammar engine will only work with ascii. I know this is a major limitation for projects using something other than simple Latin characters. I have good reason to believe that this limitation can be removed when Radiant moves to Ruby 1.9. On the other hand, patches are welcome once I push the current development code to github. :-)

Ruby doesn’t support conditional branching within a regular expression. This limits the complexity we can support. As a result, we’ve had to rethink how we handle character class algebra and will require the ‘bitset’ library. However, I think the resulting facility of specifying character sets is worth the extra installation effort.

We are currently working on two different regular expression languages. One is used to specify tokens and the other for rules. The major difference is that the atomic unit in a token is the character while the atomic unit in a rule is the token. Rules can also have actions associated with them while tokens do not.

Both tokens and rules will act like functions in a library. You can call them in an expression just like a function. If you call the token or rule without a trailing question mark, you’ll get any data structure that results from matching the token/rule against the provided string. If you call it with the trailing question mark, you’ll get a boolean response indicating the success/failure of trying to match the string.

Tokens

As a test case, I’m starting to put together a grammar for the Donne markup. For tokens, I have:

<g:token g:name="NL" g:matches="[:nl:]" />

<g:context g:mode="linetext">
  <g:token g:name="LETTER" g:matches="[:alpha:]" />
  <g:token g:name="NUMBER" g:matches="[:digit:]" />
  <g:token g:name="LT"  g:matches="&lt;" />
  <g:token g:name="LTLT"  g:matches="&lt;&lt;" />
  <g:token g:name="GT"  g:matches=">" />
  <g:token g:name="GTGT"  g:matches=">>" />
  <g:token g:name="PUNCTUATION" g:matches="[:punct:]" />
  <g:token g:name="DOUBLELETTER" g:matches="[:alpha:][2]" />
</g:context>

<g:context g:mode="linenumber">
  <g:token g:name="WORK" g:matches="[:alnum:]+" />
  <g:token g:name="MANUSCRIPT" g:matches="[:alnum:]+" />
  <g:token g:name="LINE" g:matches="[:alnum:]+" />
</g:context>

<g:token g:name="NL" g:matches="[:nl:]" />

<g:context g:mode="linetext">

<g:token g:name="LETTER" g:matches="[:alpha:]" />

<g:token g:name="NUMBER" g:matches="[:digit:]" />

<g:token g:name="LT" g:matches="<" />

<g:token g:name="LTLT" g:matches="<<" />

<g:token g:name="GT" g:matches=">" />

<g:token g:name="GTGT" g:matches=">>" />

<g:token g:name="PUNCTUATION" g:matches="[:punct:]" />

<g:token g:name="DOUBLELETTER" g:matches="[:alpha:][2]" />

</g:context>

<g:context g:mode="linenumber">

<g:token g:name="WORK" g:matches="[:alnum:]+" />

<g:token g:name="MANUSCRIPT" g:matches="[:alnum:]+" />

<g:token g:name="LINE" g:matches="[:alnum:]+" />

</g:context>

You’ll notice that I’ve specified a mode for almost all of the tokens (using a g:context element as shorthand). This is similar to the mode attribute in XSLT in that the token is only active if the grammar engine is in that mode. If no mode is specified, then the token is active in all modes.

I’ve also used named character classes instead of specifying explicit ranges. This will help when the engine supports Unicode since the named classes will automatically expand to encompass the standard Unicode equivalents. It also makes the expressions a little more readable.

Character Set Algrebra

Character sets can be more than just simple expressions as in the above token definitions. You can add and subtract them to get exactly the set of characters you want.

Some examples of character set algebra (assuming strict 7-bit ascii):

[:xdigit:] == [:digit: + [a-f] + [A-F]]
[:alnum:] == [:alpha: + :digit:] == [:upper: + :lower: + :digit:]
consonants: [:alpha: - [aeiouAEIOU]]
everything except vowels: [ - [aeiouAEIOU] ]

Ordering is important. Set operations are evaluated left to right, so the following are not equivalent: [:alpha: - [aeiouAEIOU]] and [ -[aeiouAEIOU] + :alpha: ]. The first is the set of consonants (alpha from which we remove the vowels) while the second is everything (all characters from which we remove the vowels and then add all alphabetical characters).

Set operations can be parenthesized if it helps make things clearer. For example, the following two are equivalent: [:alpha: - [aeiou] - [AEIOU]] and [:alpha: - ([aeiou] + [AEIOU])].

Rules

Rules describe how tokens are related. For example, the top-level rule in the Donne transcription grammar is:

<g:rule g:name="document">
  <g:when g:matches="[line] ([NL]+ [line])* [NL]*">
    <g:result f:select="./line">
  </g:when>
</g:rule>

<g:rule g:name="document">

<g:when g:matches="[line] ([NL]+ [line])* [NL]*">

<g:result f:select="./line">

</g:when>

</g:rule>

This says that a document consists of a line followed by zero or more lines separated by one or more new lines and ending with an optional newline. If this successfully matches, then the result of the match should be whatever was produced by matching the individual ‘line’ rules.

The ‘[NL]’ is just the token from before. The ‘line’ rule is defined as:

<g:rule g:name="line">
  <g:when g:matches="{linenumber} [linenumber] {linetext} [opt-ctrl] [text]">
    <g:result>
      <g:value g:path="ctrl" f:select="./opt-ctrl" />
      <g:value g:path="work"  f:select="./linenumber/work" />
      <g:value g:path="manuscript" f:select="./linenumber/manuscript" />
      <g:value g:path="line" f:select="./linenumber/line" />
      <g:value g:path="text" f:select="./text" />
    </g:result>
  </g:when>
</g:rule>

<g:rule g:name="line">

<g:when g:matches="{linenumber} [linenumber] {linetext} [opt-ctrl] [text]">

<g:result>

<g:value g:path="ctrl" f:select="./opt-ctrl" />

<g:value g:path="work" f:select="./linenumber/work" />

<g:value g:path="manuscript" f:select="./linenumber/manuscript" />

<g:value g:path="line" f:select="./linenumber/line" />

<g:value g:path="text" f:select="./text" />

</g:result>

</g:when>

</g:rule>

This rule is a bit more complex than the ‘lines’ rule. It also shows us how we might use the mode to change the token set we have available.

Here, we expect to begin in the ‘linenumber’ mode so that we don’t have to worry about the linetext tokens matching when we expect only the ‘linenumber’ rule and tokens to match. Once we are past the ‘linenumber’, we switch to expecting tokens in the ‘linetext’ mode. If we find an optional control character and text, then we successfully match a line and run the associated code that sets up the data for that line.

The linenumber rule is defined as:

<g:context g:mode="linenumber">
  <g:rule g:name="linenumber">
    <g:when g:matches="[WORK] '.' [MANUSCRIPT] '.' [LINE]">
      <g:result>
        <g:value g:path="work"  f:select="./WORK" />
        <g:value g:path="manuscript"  f:select="./MANUSCRIPT" />
        <g:value g:path="line"  f:select="./LINE" />
      </g:result>
    </g:when>
  </g:rule>
</g:context>

<g:context g:mode="linenumber">

<g:rule g:name="linenumber">

<g:when g:matches="[WORK] '.' [MANUSCRIPT] '.' [LINE]">

<g:result>

<g:value g:path="work" f:select="./WORK" />

<g:value g:path="manuscript" f:select="./MANUSCRIPT" />

<g:value g:path="line" f:select="./LINE" />

</g:result>

</g:when>

</g:rule>

</g:context>

You’ll also notice that the actual match is in a g:when element. A rule can match multiple patterns, each with its own associated code. Here, we are using the linenumber tokens defined above and expecting them to be separated by dots (.). You can think of the “’.’” notation as defining an anonymous token that matches a literal string. Spaces in the rule pattern are ignored. To match an explicit space in the string you’re parsing, you will need to include it explicitly in the pattern.

If we match a line number, we just translate the names of the items from uppercase to lowercase in the result that is passed on to the ‘line’ rule.

Grammars

Grammars are collections of rules and tokens. The current vision is that grammars will live inside libraries (yet to be coded) and make available as functions those rules and tokens that are not attached to a particular mode.

The resulting system will provide named regular expressions (tokens) and parsers (rules). I’m debating how I want to construct the parser: either top-down or bottom-up. Top-down is easier in some ways because we don’t need to construct a state machine to manage building up more complex rules from less-complex sets of tokens/rules. Bottom-up is nice because it allows the higher-level patterns to emerge from the collection of tokens/rules produced from processing the given string without having to determine beforehand what our target is: we can stop processing as soon as we have a single rule encompassing everything we’ve seen.

Right now, we have a working token expression parser that gives us basic regular expression support. We’re developing the rule engine and the general library support in Fabulator in which we’ll embed the grammar definitions. Once we have libraries and grammars available in the Radiant+Fabulator combination, we’ll remove the ‘matches’ function from the grammar extension. Regular expressions shouldn’t be embedded in a program if they are subject to change. Instead, they can be documented by putting them in a project-specific library/grammar.

James Gottlieb

Category Archives: Fabulator

Page creation

Templates

Presentation Layout

Presentation Management

Workflows

Gems updated

Libraries, some more

Libraries and Grammars

Grammars and Structures

Grammars

Structure

More Regex Work

Tokens

Character Set Algrebra

Rules

Grammars

Seeing what happens when you collide the humanities with the digital