<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>JamesGottlieb</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/" />
    <link rel="self" type="application/atom+xml" href="http://www.jamesgottlieb.com/atom.xml" />
    <id>tag:www.jamesgottlieb.com,2009-08-09://1</id>
    <updated>2010-02-13T20:42:09Z</updated>
    <subtitle>Creative writing, digital humanities, and information technology.</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.32-en</generator>

<entry>
    <title>Fabulator Update</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2010/02/fabulator-update.html" />
    <id>tag:www.jamesgottlieb.com,2010://1.75</id>

    <published>2010-02-13T20:17:13Z</published>
    <updated>2010-02-13T20:42:09Z</updated>

    <summary>The Fabulator extension for Radiant is forming up well. I&#8217;m planning on extracting the core engine from the extension and publishing it as a standalone Ruby library that can be plugged in to other systems. I&#8217;ll be working it in...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Digital Humanities" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p>The Fabulator extension for Radiant is forming up well.  I&#8217;m planning on extracting the core engine from the extension and publishing it as a standalone Ruby library that can be plugged in to other systems.  I&#8217;ll be working it in to the Writing and Learning Communities software as a way to build assignment modules.</p>
]]>
        <![CDATA[<div class="zemanta-img mt-image-right" style="margin-top: 1em; margin-right: 1em; margin-bottom: 1em; margin-left: 1em; display: block; float: right; width: 310px; "><a href="http://commons.wikipedia.org/wiki/Image:SW_layercake_2006.svg"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/5/54/SW_layercake_2006.svg/300px-SW_layercake_2006.svg.png" alt="Semantic Web &quot;Layercake&quot; (2006)" width="300" height="345"></a><p class="zemanta-img-attribution" style="font-size:0.8em">Image via <a href="http://commons.wikipedia.org/wiki/Image:SW_layercake_2006.svg">Wikipedia</a></p></div>

<p>I don&#8217;t have any on-line documentation of the language and extension API yet, but I should in a month or two.  June at the latest.  I&#8217;m redesigning and rebuilding the TAMU College of Liberal Arts Digital Humanities Program website.  The new site will go live around June.</p>

<p>The current core engine has three complementary systems: the XML framework for defining the parts of an application, the XML actions that can be run, and an expression language for manipulating the data tree in the application.  The framework sets up the overall application, view, and transition contexts for the actions.  The core actions manage control flow and data transformation.  The expression language can be used to perform calculations, similar to <a class="zem_slink" href="http://en.wikipedia.org/wiki/XPath" title="XPath" rel="wikipedia">XPath</a>.  The system is designed to act on application data similarly to how <a class="zem_slink" href="http://en.wikipedia.org/wiki/XSL_Transformations" title="XSL Transformations" rel="wikipedia">XSLT</a> does when manipulating XML documents.</p>

<p>In addition to the core engine, I&#8217;m splitting the <a class="zem_slink" href="http://en.wikipedia.org/wiki/Resource_Description_Framework" title="Resource Description Framework" rel="wikipedia">RDF</a> manipulation out into its own extension and starting a workflow management extension.  By summer, we should have the core engine and the two extensions fairly polished.</p>

<p>The beauty of this system is that I can add capabilities that any project can use, and add them in a way that lets me focus on the capability instead of all of the tasks that I would have to worry about in a general, standalone application that provided the same capability.  For example, I&#8217;m planning on developing an extension to manage all of the data calculations needed to support a faceted browser.  If I do it right, all I&#8217;ll have to worry about is providing the calculations.  The core engine will be able to tie everything together easily.  In addition, all the calculations should be generic enough that I can use them for something other than a faceted browser.</p>

<p>This engine allows you to build an interactive, database-backed application without having to do any of the repetitive basic work that traditionally goes into such an application.  No worrying about filtering and constraining data from the browser (it&#8217;s trivial to specify the filters and constraints and not worry about how it&#8217;s done), no getting the right view in the right place in the code, no worrying about people jumping into the middle of a process, no worrying about what to do with data between requests when you need to collect more data before doing something, no worrying about managing a relational database schema.  Everything just works with descriptions of what you want.</p>

<p>The next steps will be to look at several projects using this system and see where I&#8217;m repeating myself in those projects.  Then, following the DRY principle, reduce the redundancy in the descriptions.  For example, if I have the same RDF pattern over and over again for a particular object type (a person, for instance), then I should have somewhere to describe that pattern so I can just say that I want all of the people and not worry about what the pattern looks like.  At the same time, I should be able to say that I want all of the people as well as some additional information without jumping through too many hoops or worrying about how the data is stored.</p>

<div class="zemanta-pixie" style="margin-top:10px;height:15px"><a class="zemanta-pixie-a" href="http://reblog.zemanta.com/zemified/044914e5-be40-4bd0-8d5b-987f73847675/" title="Reblog this post [with Zemanta]"><img class="zemanta-pixie-img" src="http://img.zemanta.com/reblog_e.png?x-id=044914e5-be40-4bd0-8d5b-987f73847675" alt="Reblog this post [with Zemanta]" style="border:none;float:right"></a><span class="zem-script more-related pretty-attribution"><script type="text/javascript" src="http://static.zemanta.com/readside/loader.js" defer="defer"></script></span></div>
]]>
    </content>
</entry>

<entry>
    <title>Fabulator Design: RDF Operations</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2009/11/fabulator-design-rdf-operations.html" />
    <id>tag:www.jamesgottlieb.com,2009://1.65</id>

    <published>2009-11-12T16:42:09Z</published>
    <updated>2009-11-12T17:27:06Z</updated>

    <summary>If you&#8217;re familiar with MVC design, or at least with Ruby on Rails, then you&#8217;ve heard of CRUD: create, read, update, delete. These are the basic operations that are possible for data. There are various ways to do each of...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
    <category term="database" label="Database" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="languages" label="Languages" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="programming" label="Programming" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="webapplication" label="Web application" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p>If you&#8217;re familiar with MVC design, or at least with <a class="zem_slink" href="http://rubyonrails.org/" title="Ruby on Rails" rel="homepage">Ruby on Rails</a>, then you&#8217;ve heard of <a class="zem_slink" href="http://en.wikipedia.org/wiki/Create%2C_read%2C_update_and_delete" title="Create, read, update and delete" rel="wikipedia">CRUD</a>: create, read, update, delete.  These are the basic operations that are possible for data.  There are various ways to do each of them, but these four are at the heart of any data-driven web application.</p>

<p>The Fabulator is a data-driven web application engine, so it makes sense that it should support the four CRUD operations.  Since the database is essentially an <a class="zem_slink" href="http://en.wikipedia.org/wiki/Resource_Description_Framework" title="Resource Description Framework" rel="wikipedia">RDF</a> model, we need to map CRUD to RDF.</p>
]]>
        <![CDATA[<p><a href="http://www.jamesgottlieb.com/assets_c/2009/11/Screen%20shot%202009-11-12%20at%2010.51.37%20AM-18.html" onclick="window.open('http://www.jamesgottlieb.com/assets_c/2009/11/Screen shot 2009-11-12 at 10.51.37 AM-18.html','popup','width=578,height=301,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://www.jamesgottlieb.com/assets_c/2009/11/Screen%20shot%202009-11-12%20at%2010.51.37%20AM-thumb-200x104-18.png" alt="Screen shot 2009-11-12 at 10.51.37 AM.png" class="mt-image-right" style="margin: 0pt 0pt 20px 20px; float: right;" height="104" width="200"></a></p>

<p>Fabulator applications are essentially <a class="zem_slink" href="http://en.wikipedia.org/wiki/XML" title="XML" rel="wikipedia">XML</a> documents describing what the application should do.  Associated with the applications are page parts (Radiant is the host <a class="zem_slink" href="http://en.wikipedia.org/wiki/Content_management_system" title="Content management system" rel="wikipedia">CMS</a>) that are rendered if the application is in the state for that part.  The application is just a document in a CMS.</p>

<p>Since Fabulator applications are XML documents, we want an XML way of doing each of the CRUD operations using template languages such as TAL as examples.  For this, we have a series of operations that take an RDF template (we can call this a sketch) with placeholders representing data in the application context.</p>

<h2>Reading</h2>

<p>To read from an RDF model, we simple provide a sketch of what the RDF graph should look like and put our variable names where we want the corresponding RDF data to be accessible in the application views (anything prefixed with <code>?</code> is considered a variable name here).  Some of the details will probably change as we get some examples up and running, but the following snippet gives the flavor of what we intend to do.</p>

<p><textarea name="code" class="xml" cols="60">&lt;rdf-query model=&#8221;&#8230;&#8221;&gt;
  &lt;rdf:RDF xmlns:rdf=&#8221;http://www.w3.org/1999/02/22-rdf-syntax-ns#&#8221;
    xmlns:ore=&#8221;http://www.openarchives.org/ore/terms/&#8221;
    xmlns:dc=&#8221;http://purl.org/dc/elements/1.1/&#8221;
    xmlns:dcterms=&#8221;http://purl.org/dc/terms/&#8221;
    xmlns:foaf=&#8221;http://xmlns.com/foaf/0.1/&#8221;
    xmlns:rdfs=&#8221;http://www.w3.org/2000/01/rdf-schema#&#8221;&gt;
    &lt;rdf:Description rdf:about=&#8221;?/url&#8221;&gt;
        &lt;ore:describes rdf:resource=&#8221;?/url&#8221;/&gt;
        &lt;dcterms:creator rdf:parseType=&#8221;Resource&#8221; xsm:base=&#8221;?/creator&#8221;&gt;
            &lt;foaf:name&gt;?name&lt;/foaf:name&gt;
            &lt;foaf:page rdf:resource=&#8221;?url&#8221; /&gt;
        &lt;/dcterms:creator&gt;
        &lt;dcterms:created rdf:datatype=&#8221;http://www.w3.org/2001/XMLSchema#dateTime&#8221;&gt;
          ?/created_at
        &lt;/dcterms:created&gt;
        &lt;dcterms:modified rdf:datatype=&#8221;http://www.w3.org/2001/XMLSchema#dateTime&#8221;&gt;
          ?/updated_at
        &lt;/dcterms:modified&gt;
        &lt;dc:rights&gt;?/rights/description&lt;/dc:rights&gt;
        &lt;dcterms:rights rdf:resource=&#8221;?/rights/url&#8221;/&gt;
    &lt;/rdf:Description&gt;
  &lt;/rdf:RDF&gt;
&lt;/rdf-query&gt;
</textarea></p>

<h2>Creating and Updating</h2>

<p>Since objects are automagically created if they don&#8217;t already exist, creating and updating are synonymous in RDF.  In this case, we use <code>rdf-assertion</code> instead of <code>rdf-query</code> to change from reading to writing.  We are documenting a truth.  Substitutions from the application context are the same as for reading.  To the extent that we can, the semantics of the two operations will be identical except for the direction of data flow.</p>

<h2>Deleting</h2>

<p>Objects don&#8217;t really exist in RDF unless there&#8217;s something to say about them, so objects will disappear when there&#8217;s nothing associated with them.  Deleting all information about an object will remove it from the RDF model.</p>

<p>Again, changing a <code>rdf-assertion</code> element to a <code>rdf-denial</code> element will remove any information that was added by the <code>rdf-assertion</code> (assuming an identical context).</p>

<h2>Changing Application State</h2>

<p>Besides the CRUD operations, we need a way to change the state of the application based on the information in the RDF model and the application context.  For this, we have <code>rdf-assert</code> and <code>rdf-deny</code> that will take a sketch of what we expect to have (or not have) in the model.  If the match succeeds (or fails), we change to the corresponding state and proceed.</p>

<p>This can be used to initialize an application to a particular start state based on information we already have, or change which state is next based on the information we already have.</p>
]]>
    </content>
</entry>

<entry>
    <title>Fabulator: the Future of Gestinanna</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2009/11/fabulator-the-future-of-gestinanna.html" />
    <id>tag:www.jamesgottlieb.com,2009://1.64</id>

    <published>2009-11-09T16:17:50Z</published>
    <updated>2009-11-09T16:50:00Z</updated>

    <summary>In a post from quite a while back, I talk about eXtensible State Machines, a way to reduce a web application to an XML document. The original implementation was a stand-alone web application environment/framework written in Perl. The web has...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Digital Humanities" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="languages" label="Languages" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="semanticweb" label="Semantic Web" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="webapplication" label="Web application" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="xml" label="XML" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p>In a post from quite a while back, I talk about <a href="/2005/01/extensible-state-machines.html">eXtensible State Machines</a>, a way to reduce a web application to an <a class="zem_slink" href="http://en.wikipedia.org/wiki/XML" title="XML" rel="wikipedia">XML</a> document.  The original implementation was a stand-alone web application environment/framework written in <a class="zem_slink" href="http://www.perl.org/" title="Perl" rel="homepage">Perl</a>.  The web has evolved since that initial work.  We now have web 2.0, <a class="zem_slink" href="http://rubyonrails.org/" title="Ruby on Rails" rel="homepage">Ruby on Rails</a>, and content management systems that are easy to extend (e.g., <a href="http://www.radiantcms.org/">Radiant</a>).</p>
]]>
        <![CDATA[<div class="zemanta-img mt-image-right" style="margin: 1em; display: block; float: right; width: 254px;"><a href="http://commons.wikipedia.org/wiki/Image:Finite_state_machine_example_with_comments.svg"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Finite_state_machine_example_with_comments.svg/244px-Finite_state_machine_example_with_comments.svg.png" alt="A graph of an extremely basic process in a fin..." height="347" width="244"></a><p class="zemanta-img-attribution" style="font-size: 0.8em;">Image via <a href="http://commons.wikipedia.org/wiki/Image:Finite_state_machine_example_with_comments.svg">Wikipedia</a></p></div>

<p>My current work involves creating a series of extensions to Radiant to implement such things as <a href="http://github.com/jgsmith/radiant-tei-tools/">TEI to HTML transformations</a>.  I&#8217;m also using the state machine concept to define modules in the <a href="http://github.com/jgsmith/wlc/">Writing and Learning Communities</a> software so a course developer can create new assignment modules for students without having to know Ruby or have access to the server.</p>

<p>Fabulator will be the next version of the &#8220;document as application&#8221; system.  This will be an extension to Radiant that defines a new &#8220;Fabulator&#8221; page type.  Radiant gives us a way to manage everything in the application as a single, compact document.</p>

<p>The body of the page remains a simple text document that can be used to provide a description of the application.  A separate page part is used to define the state machine, and each state in the state machine has its view in a page part with the same name as the state.</p>

<p>The Fabulator page type will define a number of tags that can be used in the views (page parts) to create forms and pull in data.  I&#8217;m still debating what data sources to support out-of-the-box, though I&#8217;m sure I&#8217;ll add the ability to read from <a class="zem_slink" href="http://en.wikipedia.org/wiki/RSS" title="RSS" rel="wikipedia">RSS</a> and <a class="zem_slink" href="http://en.wikipedia.org/wiki/Resource_Description_Framework" title="Resource Description Framework" rel="wikipedia">RDF</a> sources.  I will probably create an admin extension to manage data sources that are part of the Radiant site installation, or at least data repositories that can be written to by the Fabulator pages.</p>

<p>I also need a way to upload <a class="zem_slink" href="http://en.wikipedia.org/wiki/XSL_Transformations" title="XSL Transformations" rel="wikipedia">XSLT</a> files that can be used as macros for the Fabulator pages.  I&#8217;m not sure yet how I want to present them to the application author: either as a list of check boxes that they can select (e.g., stating that the application is a wizard that consists of a series of views that culminate in some action with the collected data), or as a help box with the <a class="zem_slink" href="http://en.wikipedia.org/wiki/XML" title="XML" rel="wikipedia">XML</a> snippets to add to pull in the XSLT (less friendly, but probably easier).</p>

<p>I don&#8217;t have code up as of when I wrote this entry, but when I do, it will be available at <a href="http:github.com/jgsmith/radiant-fabulator/">http:github.com/jgsmith/radiant-fabulator/</a>.</p>
]]>
    </content>
</entry>

<entry>
    <title>The Church of Latter Day Scholars</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2008/01/the-church-of-latter-day-scholars.html" />
    <id>tag:www.jamesgottlieb.com,2008://1.13</id>

    <published>2008-01-28T16:21:12Z</published>
    <updated>2009-09-26T19:30:05Z</updated>

    <summary>Michael Godwin, General Counsel, Wikipedia Foundation, is on campus today visiting with various digital humanities groups and giving a talk titled, &quot;After the Revolution.&quot; I&apos;ve been thinking about the role of libraries and the Internet and academic responses to Wikipedia....</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Digital Humanities" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p>Michael Godwin, General Counsel, Wikipedia Foundation, is on campus today visiting with various digital humanities groups and giving a talk titled, "After the Revolution."  I've been thinking about the role of libraries and the Internet and academic responses to Wikipedia.</p>]]>
        <![CDATA[<p>In <a href="http://www.degreetutor.com/library/librarians-online/michael-hart">an interview,</a> Michael Hart, founder of Project Gutenberg, says: "Before The Gutenberg Press the average person could own zero books. Before Project Gutenberg the average person could own zero libraries, speaking only of the words, of course, not the physical entity or the library staff, etc."</p>

<p>Academics seem to have an allergic reaction to projects like Wikipedia and Project Gutenberg, but they are missing the point of these projects and an opportunity as educators.</p>

<p>Before the Gutenberg Press, knowledge of the Bible was mediated by the Church.  Your understanding of what God wanted was based on your faith in the Church and its priests.  No one could afford to own their own illuminated edition of the Bible, much less have a need to read.</p>

<p>After the Press, the Bible was affordable for a lot of people.  The Press lead to the Reformation because people could trust their own reading instead of having to put their faith in the Church.  Printed books weren't as beautiful as the earlier illuminated editions on parchment, but the information content was the same and costs much less.  Printing didn't preserve the form of the earlier work, but transformed it in a way that lead to wider and more flexible use.  Today, books can be carried in a back pocket and read anywhere.</p>

<p>Electronic books are going to have the same impact once they figure out how they should work.  The various gadgets like Amazon's Kimble will eventually go away.  None of the current snake oil does for printed books what printed books did for hand copied editions.  The places you can use an e-book are fewer than those in which you can use a printed book.  Current electronic devices represent regression in reading.</p>

<p>However, electronic books do what print books can't do: they let you own a library at an affordable price.  But electronic books aren't yet a revolution.  They just put in electronic form what is in print.</p>

<p>The Press also opened up another realm to people with fewer means: authorship.  By making books cheaper to print, new books could be printed with less initial investment.  The Press lead to an explosion in writing.</p>

<p>The Internet has produced a similar explosion through blogs and websites.  The barrier to entry today is the lowest it has ever been, regardless of the audience size.  Anyone can be an author, and anything published can be read.</p>

<p>The Internet is today's library, and devices like the iPhone are the means of accessing that library.</p>

<p>The Internet also represents the Press to todays Church of scholarship.  No longer does information require faith in the mediation of scholars.  We can write and read our own encyclopedia directly.</p>

<p>Instead of fighting this revolution, scholars should be teaching critical thinking skills that let us sort through the information we find and decide for ourselves what is good and consistent with everything else we've seen and learned.</p>]]>
    </content>
</entry>

<entry>
    <title>RDF, Inference Engines, and the Web</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2008/01/rdf-inference-engines-and-the-web.html" />
    <id>tag:69.162.68.218,2008:/~jamesgot/blog//1.17</id>

    <published>2008-01-15T23:08:58Z</published>
    <updated>2009-09-26T19:44:17Z</updated>

    <summary>Since starting in the College of Liberal Arts in November, 2007, as the new lead developer for digital humanities, I&apos;ve been putting together some design ideas and initial code towards a Digital Resources Workbench....</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Digital Humanities" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p>Since starting in the <a href="http://clla.tamu.edu/">College of Liberal Arts</a> in November, 2007, as the new lead developer for digital humanities, I've been putting together some design ideas and initial code towards a Digital Resources Workbench.</p>
]]>
        <![CDATA[<p>Most digital humanities projects that I've seen are cataloging a lot of information and providing a search interface for exploring that information.  It could be editions of a particular book, images on a topic, or perhaps more granular information about concepts, places, people, events, etc.  A mix of artifact and annotation.</p>

<p>This seems to break down into a fairly simple set of responsibilities: entry, storage, retrieval, interpretation, and presentation.  If I can define simple, open, standard APIs to interface between each stage, then each stage can be fairly independent and reusable by itself.</p>

<p>Artifact and annotation have slightly different needs, so we need to break those apart into artifact collection and knowledge base.  There might some special cases, but I'm trying to push as much as possible any "special handling" into the back end.</p>

<p>Artifacts have a central document (image, text, video, etc.) that can be considered a canonical source.  Some meta-data will be associated with it, such as the size of the document, the type, the collection it belongs to, perhaps some ACLs and a URL for public access.  This meta-data is the minimal set needed to manage the artifact itself.  Any information such as who created it, where it was published, etc., would be in an associated knowledge base.</p>

<p>Knowledge bases are collections of facts.  These can be written as RDF triples (predicate, subject, object) with the subject being the resource described, the predicate indicating an aspect of the subject, and the object being the value associated with that aspect.  If we want to add a little power to the knowledge base, we can add an inference engine, such as Prolog, and a set of rules.</p>

<p>An inference engine lets us ask questions such as "who are the descendants of A?" or "which papers lead to P through a citation chain?" or even "Are A and B related?" without having to write a lot of programming code.</p>

<p>The resources in the knowledge base will be exposed through a REST interface that will allow editing (with the proper permissions) as well as reading.  This allows the greatest flexibility in using applications from web browsers to other non-end-user systems, such as other research projects.</p>

<p>So that knowledge bases and inference engines are available, but aren't open to abuse (inferencing can require a lot of computation), I'm planning on offering the results of running a particular query as an RSS or Atom feed.  This is similar to the interface for listing all of the resources of a particular type in a model, but would be read-only.  The actual query would be defined through some administrative interface that would associate a public URL with the query&mdash;similar to how Yahoo! Pipes exposes its pipes.  This has the added advantage of disassociating the resource identifier from the resource definition for the query, making client applications easier to maintain.</p>

<p>If licensing works out (which we expect it to at the moment), I should have a code release in a month.  It won't have ACLs, inferencing, or other "fancy" features, but it should be a proof-of-concept for the basic ideas.</p>
]]>
    </content>
</entry>

<entry>
    <title>Scholarly Software Editions</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2007/10/scholarly-software-editions.html" />
    <id>tag:69.162.68.218,2007:/~jamesgot/blog//1.19</id>

    <published>2007-10-05T18:33:07Z</published>
    <updated>2009-09-26T19:48:17Z</updated>

    <summary>The NEH and other U.S. federal government agencies are pushing the digital humanities projects to result in something that can be shared. If this is an application that people can use, especially an application that resides on a central server,...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Digital Humanities" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p>The <a href="http://www.neh.gov/grants/guidelines/digitalhumanitiesstartup.html">NEH</a> and other U.S. federal government agencies are pushing the digital humanities projects to result in something that can be shared.  If this is an application that people can use, especially an application that resides on a central server, then the NEH is also wanting provisions for long-term maintenance.  Ultimately, digital humanities projects should seek to be a resource that other scholarly work can build on.  In this post, I want to explore what this might mean for web-based applications.</p>
]]>
        <![CDATA[<p>We need to balance the control an author has over their work and the interest in the wider academic world in having access to that work.  We need to balance the pressure of evolution of a project with the need to have a stable representation that other work can reference.</p>

<p>I believe the way to do this is to introduce the idea of the project edition.  This mirrors how reference works are produced in print media.  Libraries and journals already have developed a system for working with such editions.</p>

<p>After an edition is printed, the publisher will sometimes produce an errata listing all of the errors corrected.  Some of these corrections will be made for the next printing.  Editions are usually produced when the changes are so large that the work is substantially different and shouldn't serve as a substitute for the earlier version.</p>

<p>Editions help researchers communicate on which version of a work they are building their own work.  This helps other researchers verify their work and possibly reproduce it.  If the foundation is constantly shifting, then reproducibility can't be reliably verified.</p>

<p>Trying to build an edition of a web application shows that such an application consists of two parts: the data on which the application is built, and the interface that allows people to explore that data.</p>

<p>To preserve reproducibility, we need to keep changes to a minimum and publish what they are.  For the interface, this means that new features should be saved for new editions.  New data can be added to the project without breaking reproducibility if the interface has a way to restrict its data at the user's discretion to what was available at an earlier date.  We also need to publish when data has been added or changed.</p>

<p>Another property of printed material is that when a new edition comes out or the author moves to another institution, the older editions already in a library's collection remain.  Once a work has been made available for others to build on, it remains available.</p>

<p>For digital works, continued availability requires a permanent home for a particular edition.  This requires commitment from both the institution and the author.  Institutions need to provide the infrastructure on which an edition could run, consisting of the server platforms and the URL namespaces, and authors need to grant the institution a license to run the edition in perpetuity with permission to make sufficient modifications to keep it running (just as <em>Beowulf</em> requires translation before the story is accessible to most English speakers today).</p>

<p>Such a license does not restrict future work of the author.  Future editions of the application and data can run anywhere the author chooses.  By granting such a license though, the author ensures that they will not be the reason that their work won't be found if referenced by another project.</p>

<p>In summary:</p>

<ul>
<li>Digital humanities projects need to follow a versioning model that produces discrete editions that can be referenced.</li>
<li>These projects need to publish a list of any changes that are made to the published editions of the interface and data.</li>
<li>Institutions need to make the infrastructure commitment that allows for long-term maintenance of these projects.</li>
<li>Authors need to make the licensing commitment that allows for long-term maintenance of these projects.</li>
</ul>
]]>
    </content>
</entry>

<entry>
    <title>XML Transformation Creation</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2007/10/xml-transformation-creation.html" />
    <id>tag:www.jamesgottlieb.com,2007://1.16</id>

    <published>2007-10-02T21:36:11Z</published>
    <updated>2009-09-26T19:50:31Z</updated>

    <summary>I was looking around the web for references about EAD, an XML vocabulary mentioned in a Digital Humanities Working Group meeting Monday. I could see cases where people would want to have documents marked up with both TEI and EAD....</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p>I was looking around the web for references about <a href="http://www.loc.gov/ead/">EAD</a>, an XML vocabulary mentioned in a Digital Humanities Working Group meeting Monday.  I could see cases where people would want to have documents marked up with both <a href="http://www.tei-c.org/">TEI</a> and EAD.</p>
]]>
        <![CDATA[<p>XSLTs basically describe a function that is applied to an XML document resulting in another document (not necessarily XML):  <code>D = <em>f</em>(X)</code> where <code>X</code> is a subset of <code>D</code> (for a particular document, I'd say: <code>d = <em>f</em>(x)</code>).  We usually are given <code>X</code> and <code><em>f</em></code> and asked for <code>D</code>, but I'm wondering if we could be given <code>D</code> and <code>X</code> and find <code><em>f</em></code>.</p>

<p>This is definitely a pure computer science problem, but it has digital humanities applications.</p>

<p>A web search shows some work in this direction, but usually having people manually map elements between the two document sets to generate the XSLT.</p>

<p>Another thing that would come from this is a way to rank XML vocabularies based on their expressive range.  If we have two sets of documents (<code>A</code> and <code>B</code>) based on two different XML vocabularies, then if an XSLT exists that maps <code>A</code> -&gt; <code>B</code>, but no XSLT exists that maps <code>B</code> -&gt; <code>A</code>, then the vocabulary for <code>A</code> could be seen as having a larger expressive range than that used for <code>B</code>.  That would let us have a more solid foundation for saying that TEI is more expressive than Docbook (which I believe it is, but don't have good data to base that belief on at the moment).</p>

<p>I can manually create XSLTs to go from TEI to Docbook to HTML because I believe there's a loss of information from one format to the next (ignoring the pushing of that information into CSS at the final HTML stage) and because Docbook is a publishing vocabulary and HTML is, with CSS, a de facto typesetting vocabulary.  The information isn't so much lost as transformed from semantic to presentation, with the person reading the resulting document adding back the semantic information based on the presentation.  The semantic information though is removed from a readily computer-understood form: it's gone from a context-free to a context-dependent form.</p>
]]>
    </content>
</entry>

<entry>
    <title>The Middle of Nowhere</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2007/08/the-middle-of-nowhere.html" />
    <id>tag:www.jamesgottlieb.com,2007://1.15</id>

    <published>2007-08-23T09:16:05Z</published>
    <updated>2009-09-26T19:54:55Z</updated>

    <summary>I remember my early days of computing, setting the telephone headset on the earmuffs of the modem and dialing CompuServe&apos;s access number. The text would scroll across the screen almost too fast to read at 300 baud. The wealth of...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Interactive Fiction" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p>I remember my early days of computing, setting the telephone headset on the earmuffs of the modem and dialing CompuServe's access number.  The text would scroll across the screen almost too fast to read at 300 baud.  The wealth of information available in the libraries and newsgroups was almost overwhelming at the time.</p>
]]>
        <![CDATA[<p>I took an interest in playing several of the games, including an Adventure-style game whose name I've forgotten.  I enjoyed imagining a world in which I could take an active part.  I read up on creating interactive worlds, both on-line (multiplayer) and single player.  I played the Scott Adams adventure games.</p>

<p>At one point, someone posted some information on a new language for describing the game world.  I don't remember too many details now (this was around 1989), but looking back, it could have been the first inklings of LPC and LPMuds.</p>

<p>Through the years, I've kept an avid interest in MUDs.  I've played Everquest and World of Warcraft, but those don't have quite the richness that can be achieved in a MUD.  Their graphics can be beautiful (WoW sometimes looks like a painting on my monitor), but there's a limit to how immersed I can get in the world.</p>

<p>I've also played <a href="http://darkerrealms.org/">Darker Realms</a>, one of the early LPMuds (founded around 1991).  It sometimes shows its age, but it's an interesting game historically, having been played by quite a few people who are now prominent in the Internet community.</p>

<p>One of the problems with a lot of MUDs though is that the world isn't as original as some people would like to think it is.  Many are based on very familiar settings: Star Wars, Wheel of Time, Pern, etc.  Worlds that are based on the creative work of other people not involved in the MUD.  I wanted to avoid repeating the same mistake.</p>

<p>I have enjoyed playing World of Warcraft and will continue to play it as time allows, but I also want to create.  I think MUDs could learn some from WoW.  The game has a lot of hack-n-slash activity (almost everything involves killing something), but there's also a rich story line running through the game that comes out in the quests.</p>

<p>I am working on building a world then that emphasizes role playing and questing over killing things.  There will be some killing, I'm sure, but I'm more interested in solving problems than just brute-forcing my way through.  I'm also more interested in story.</p>

<p>The Middle of Nowhere is going to be a MUD that is built around the world I've been working on for my thesis.  You probably won't find the characters from my thesis in the game, but you'll find the spirit of the world.  The Muses will be there as well as the Cardinalities.  The city center and the wild areas.</p>

<p>I'll be posting more on <a href="http://mofn.net/">the game website</a>.</p>
]]>
    </content>
</entry>

<entry>
    <title>iPhone and User Interfaces</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2007/06/iphone-and-user-interfaces.html" />
    <id>tag:www.jamesgottlieb.com,2007://1.12</id>

    <published>2007-06-27T18:38:06Z</published>
    <updated>2009-09-26T19:58:57Z</updated>

    <summary>&quot;What publishing can learn from the iPhone&quot; points in a direction I&apos;ve been thinking about for a while: we need to lessen our tie to notebooks and desktops when interacting with data. Part of this equation is Google Gears which...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p><a href="http://printisdeadblog.com/2007/06/27/apples-and-changes-what-publishing-can-learn-from-the-iphone/">"What publishing can learn from the iPhone"</a> points in a direction I've been thinking about for a while: we need to lessen our tie to notebooks and desktops when interacting with data.</p>

<p>Part of this equation is <a href="http://gears.google.com/">Google Gears</a> which allows offline interaction with web applications.  When I'm in the store and want to know if I have a particular book in my library, or what that wine was that I enjoyed the previous week but can't quite remember exactly which one it was, I want to have access to the information I need so I can make a well-informed decision.  I want to be able to update my catalog at the point of purchase instead of trying to remember to do so later after I get home.  Maybe I'm in another city, or traveling.  The closer in time I can make all the tasks that go together, the more likely I am to do all of them.  Managing data syncing and allowing client-side storage of data enables this.</p>
]]>
        <![CDATA[<p>The other bit of the equation is designing sites that work well in a hand-held format.  Now that we have a device that provides a full CSS2 (and some CSS3) compliant browser that looks like a real browser, we can design those sites.</p>

<p>The only thing we need to convince Apple to do is provide Google Gears in their iPhone installation of Safari.  This would allow the iPhone to be like a small tablet computer that can store limited amounts of information, even if the network is unavailable.</p>

<p>Before then, though, we have an opportunity to play with the format for print publishing, even if we use <a href="http://www.gutenberg.org/wiki/Main_Page">public domain works</a>.  The iPhone is about the right size to feel like a small book and for the text to be a column of print&mdash;comfortable to the eyes and hands.  This might be the opportunity to finally bring text to an electronic medium and not remove all of the comfortable aspects of reading dead-tree print.</p>
]]>
    </content>
</entry>

<entry>
    <title>Viewing Files with Google Gears</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2007/06/viewing-files-with-google-gears.html" />
    <id>tag:www.jamesgottlieb.com,2007://1.10</id>

    <published>2007-06-19T14:11:56Z</published>
    <updated>2009-09-26T20:24:38Z</updated>

    <summary>Something that web developers have been looking for, but haven&apos;t been able to do, is now possible with Google Gears. Namely, accessing a file from JavaScript. It still isn&apos;t possible to access an arbitrary file, but with Gears, you can...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p>Something that web developers have been looking for, but haven't been able to do, is now possible with <a href="http://gears.google.com/">Google Gears</a>. Namely, accessing a file from JavaScript.  It still isn't possible to access an arbitrary file, but with Gears, you can access the contents of a file that the user has selected in a file input element.</p>
]]>
        <![CDATA[<p>The following function assumes  the Prototype library.</p>

<p><textarea name="code" class="jscript" cols="60" rows="10">
var capture_file = function(file_element, url, callback){
    var ls = google.gears.factory.create("beta.localserver", "1.0");
    var rs = ls.openStore('uploads') || ls.createStore('uploads');
    rs.enabled = true;
    rs.captureFile(file_element, url);
    new Ajax.Request(url,    {
        method: 'get',
        onSuccess: function(transport)        {
            callback(transport.responseText);
        }
    });
};</textarea></p>

<p>Given a file input element (DOM reference), the url at which to store the file in the local file store, and a callback to call with the content of the file, it will retrieve the content of the file from the local disk.</p>
]]>
    </content>
</entry>

<entry>
    <title>SF.OokOok - A Distributed Collaborative Coding of Science Fiction</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2007/06/sfookook-a-distributed-collaborative-coding-of-science-fiction.html" />
    <id>tag:www.jamesgottlieb.com,2007://1.9</id>

    <published>2007-06-05T17:25:26Z</published>
    <updated>2009-09-26T20:28:34Z</updated>

    <summary>SF.OokOok, a science fiction and fantasy characterization project, is based on many of the ideas from the Genre Evolution Project (GEP) but with a different purpose. While the GEP is designed to collect data with a particular question in mind,...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Digital Humanities" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p><a href="http://sf.ookook.org/">SF.OokOok</a>, a science fiction and fantasy characterization project, is based on many of the ideas from the <a href="http://www.umich.edu/~genreevo/">Genre Evolution Project</a> (GEP) but with a different purpose. While the GEP is designed to collect data with a particular question in mind, SF.OokOok is designed to collect data that allows a researcher to prioritize the stories and other resources they may need to study in order to collect the data that they need in order to answer their question. SF.OokOok seeks to build a better card catalog of science fiction and fantasy.</p>
]]>
        <![CDATA[<p>We are drawing from three primary sources in designing the data management: the GEP for the initial set of broad questions in the survey, <a href="http://cpr.molsci.ucla.edu/">Calibrated Peer Review</a> (CPR) for managing the influence individual data entries have on the overall results, and the <a href="http://isfdb.org/">Internet Speculative Fiction Database</a> (ISFDB) for the bibliography.</p>

<p>Those familiar with the GEP will recognize the basis of this project. However, this project is not a competitor with the GEP but a complement. The GEP is studying within the context of a university course the evolution of speculative fiction. While the GEP does characterize the stories it reads, it does so with particular questions designed with genre evolution in mind. More important than data collection, the GEP provides a good environment for students studying objective research methods in literary criticism. The GEP does not provide a lot of data that can be used outside the project. SF.OokOok does not try to answer any particular question but instead tries to collect as much data as possible that might be helpful in other research projects.</p>

<p>One of the problems with open data collection is the accuracy and trustworthiness of the data. SF.OokOok will try to correct some of this by borrowing some of the calibration ideas from CPR. In general, the more an individual agrees with others, the more weight their answers are given.</p>

<p>Finally, the question of what consititutes science fiction and fantasy is passed off to another project that is already trying to answer that question: the ISFDB. Only stories in the ISFDB will be surveyed. Until the new web API is put into production, we will periodically import the ISFDB data into our own database. Once the new web API is in production, we will consider feeding bibliography updates back to the ISFDB so people can survey fiction that might not yet be in the database.</p>
]]>
    </content>
</entry>

<entry>
    <title>Software Scaling with Moore&apos;s Law</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2007/05/software-scaling-with-moores-law.html" />
    <id>tag:www.jamesgottlieb.com,2007://1.8</id>

    <published>2007-05-29T12:33:14Z</published>
    <updated>2009-09-26T20:31:06Z</updated>

    <summary>Slashdot recently pointed to an article on CNET News, &quot;Intel: Software needs to heed Moore&apos;s Law,&quot; that raises the alarm about hardware advancement outpacing software development advancement....</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Digital Humanities" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p><a href="http://slashdot.org/">Slashdot</a> recently pointed to an article on CNET News, <a href="http://news.com.com/Intel+Software+needs+to+heed++Moores+Law/2100-1012_3-6186765.html?tag=nefd.top">"Intel: Software needs to heed Moore's Law,"</a> that raises the alarm about hardware advancement outpacing software development advancement.</p>
]]>
        <![CDATA[<p>The problem is that hardware has started to hit the physical limits of silicon. We can't go much denser without losing the properties of the silicon that we depend on to make the chips work. Instead, manufacturers have started to add more pipelines, cores, and other forms of parallelism in order to continue following Moore's Law. Unfortunately, application software is still a linear beast.</p>

<p>A few companies (notably, Google) have found ways to make application development take advantage of parallelism without introducing huge complications for the application developer. Google's search engine and Apple's Automator are both examples of applications that can take advantage of parallelism under the covers. The trick is that both are designed around a functional programming paradigm, allowing easy analysis of the application to automatically identify operations that can be run in parallel.</p>

<p>The article correctly expects new languages to be domain specific. Parallelism is much easier to achieve if you only need to worry about the aspects of the application specific to the problem and ignore all of the necessary cruft that allows someone (or another application) to interact with the application. The I/O might not always be parallelizable, but the core aspects that make the application unique many times can be.</p>

<p>We need a new language and platform for web applications. This platform should make it easy for someone to write an application that can be spread across machines and take advantage of parallelism wherever possible. This should be one of the <a href="http://paulgraham.com/hundred.html">hundred-year languages</a> that Paul Graham writes about.</p>
]]>
    </content>
</entry>

<entry>
    <title>Of Fish and Dreams</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2007/05/of-fish-and-dreams.html" />
    <id>tag:www.jamesgottlieb.com,2007://1.7</id>

    <published>2007-05-20T18:50:04Z</published>
    <updated>2009-09-26T20:32:57Z</updated>

    <summary>I&apos;ve given the novel I&apos;m writing for my thesis the working title, Of Fish and Swimming Swords. I don&apos;t have names for the second or third novel yet, but ideas are beginning to come together. They&apos;ll complete the arc begun...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Writing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p>I've given the novel I'm writing for my thesis the working title, <em>Of Fish and Swimming Swords</em>. I don't have names for the second or third novel yet, but ideas are beginning to come together. They'll complete the arc begun in the thesis.</p>

<p>The last two nights, I've woken with farely vivid dreams. Dreams aren't useful in their raw state. If you actually transcribe a dream, it won't make much sense because dream logic isn't sufficiently realistic. But dreams can provide interesting settings and plot pointers. That's what these two dreams have done.</p>
]]>
        <![CDATA[<p>Now, I have two different threads that I can explore in the second novel. Two different mind sets. Two different civilizations that happen to live together.</p>

<p>I'm not spending too much time putting the second or third novels together right now. I'm half way through the first and want to finish it before I begin the second. I want to have enough preparation though by time I finish the first that I can begin the second without too much of a gap. Otherwise, I will get lazy and it will be difficult to pick up the pen again.</p>

<p>Now that I'm half way, I can consider podcasting a few chapters knowing that the final book won't change so much that the podcasts will be of a completely different creature than what might get published. I'll aim at getting a chapter every week or so, at least for the first few chapters. If I find a publisher and they are comfortable with a complete podcast (similar to what <a href="http://www.craphound.com/">Cory Doctorow</a> does), then I will put the complete book up on this site.</p>
]]>
    </content>
</entry>

<entry>
    <title>Interfacing Digital Humanities with the Open Source Community</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2007/04/interfacing-digital-humanities-with-the-open-source-community.html" />
    <id>tag:www.jamesgottlieb.com,2007://1.6</id>

    <published>2007-04-09T21:29:54Z</published>
    <updated>2009-09-26T20:58:57Z</updated>

    <summary>This abstract is for a presentation being made at the Digital Humanities 2007 Conference in June at UIUC. Academic progress seems to depend on a person&apos;s ability to contribute to society as measured by attributable work such as articles and...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Digital Humanities" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<p><em>This abstract is for a presentation being made at the <a href="http://www.digitalhumanities.org/dh2007/">Digital Humanities 2007 Conference</a> in June at UIUC.</em></p>

<p>Academic progress seems to depend on a person's ability to contribute to society as measured by attributable work such as articles and monographs in areas that are interesting to society. These attributions depend on copyrights and patents, the keys to establishing and protecting intellectual property ownership. The various print journals and publishing houses act as a peer-reviewed intermediary. The academy establishes social interest in a project by the project's ability to attract financial support.</p>
]]>
        <![CDATA[<p>Digital Humanities (DH) relies heavily on what is now being called Web 2.0: technologies that allow nearly seamless cross-platform client/server applications (e.g., AJAX and Comet). Most of these technologies have roots in the Open Source Community (OSC). Much of the talent is also in the OSC. Until this talent is brought into DH, DH is in danger of being a field looking in from outside, watching technology advance as it tries to catch up. Yet, DH is rooted in the academy and must meet the expectations of the academy.</p>

<p>The Open Source Community has its roots in the academy, but has lived for a while in the wild among amateurs in the classic sense. Many people participate in open source because they enjoy doing so, though some participate because of their employment. The OSC is built on a stereotype of the academy: information is free and everyone is able to work on problems they find interesting. Just as professors enjoy working in a University or College with other smart people in their field, OSC members are attracted to projects with smart people. OSC members also value the openness with which they can develop and discuss projects.</p>

<p>The Open Source Community has not abandoned the notion of attribution and social relevancy. The openness of a project's development creates a trajectory along which the project travels. This trajectory is a measure of the talent behind the project and is apparent to most who are interested in the project. Misappropriating a particular release of a project captures only a single point on that trajectory. Because any particular release of a project does not bring with it any of the talent behind the project, `stealing' from the project is worth much less. Attribution in the OSC is much more than just the name beside the copyright or on the patent.</p>

<p>An Open Source project is socially relevant if it is widely used and has a strong community. There are no financial requirements for a project to be successful.  Sourceforge.net hosts Open Source projects at no cost to the project not because any particular project is worth the cost, but because the OSC itself is socially relevant. People contribute their time to a project not necessarily because they get paid but because they enjoy the project and, if they make significant contributions, they can become a well known talent in the OSC.</p>

<p>The problem, then, is how to balance the requirements of the academy against the need to create an environment that is inviting to the open source community. By openly involving the open source community, DH can access a wide variety of talent which will be involved in various projects not because they are being paid to help, but because they love the project. At the same time, DH can maintain the attribution required for academic progress.</p>

<p>One academic field of study that is leading the way with technology is Physics with the development of the world wide web at CERN to aid in sharing documents to the electronic pre-print server (<a href="http://xxx.lanl.gov/">http://xxx.lanl.gov/</a>). The Physics community can do this because it is small enough that its members know each other. Reputation is built much as it is in the Open Source Community: by personal experiences between members of the community. Formal peer review plays a secondary role. By reviewing the output of a physicist, another can see the pattern and tell if something new fits that pattern.</p>

<p>If any company represents the commercial potential of Digital Humanities, it is Google. They have been able to attract some of the top talent in the industry by providing a work environment that resembles the Open Source Community in many aspects. Some of the more interesting projects for DH have come from employees' `play time.' Google has brought a lot of smart people together under one roof, much as a University or College might do.</p>

<p>By looking to other academic fields and the Open Source Community, Digital Humanities can create a new environment encouraging rapid evolution of ideas without sacrificing the need for attribution, reputation, or social relevance.</p>
]]>
    </content>
</entry>

<entry>
    <title>A Hundred Year Language</title>
    <link rel="alternate" type="text/html" href="http://www.jamesgottlieb.com/2006/08/a-hundred-year-language.html" />
    <id>tag:www.jamesgottlieb.com,2006://1.5</id>

    <published>2006-08-09T21:27:38Z</published>
    <updated>2009-09-26T21:19:49Z</updated>

    <summary>The right way . . . is to separate the meaning of a program from the implementation details.Saying less about implementation should also make programs more flexible. Specifications change while a program is being written, and this is not only...</summary>
    <author>
        <name>James Gottlieb</name>
        <uri>http://www.jamesgottlieb.com/</uri>
    </author>
    
        <category term="Programming" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en" xml:base="http://www.jamesgottlieb.com/">
        <![CDATA[<blockquote>The right way . . . is to separate the meaning of a program from the implementation details.Saying less about implementation should also make programs more flexible. Specifications change while a program is being written, and this is not only inevitable, but desirable.<div style="text-align: right;">---Paul Graham, <a href="http://www.paulgraham.com/hundred.html">"The Hundred-Year Language"</a></div></blockquote>

<p>Traditional programming languages divide a program into two parts: comments and code. The comments describe the goal of the code while the code describes how those goals are met. Bugs arise when the two don't agree. Most programmers would rather do one or the other, but not both. Doing both seems a waste of time because too much information is being duplicated in a way that can introduce mistakes. One of the primary goals of the Gestinanna project was to collapse the comments and code into a single descriptive document that would both describe the program and be the program.</p>
]]>
        <![CDATA[<p><a href="http://en.wikipedia.org/wiki/Literate_Programming">Literate programming</a> tries to solve some of these problems, but it still results in the programmer having to supply the procedural solution as well as describing the problem instead of simply describing the goal of the solution.</p>

<p>What we need is a method of writing a program (the goal of the solution) that is both human and computer readable. The most widely understood form that meets this requirement is XML. By writing a program using an XML vocabulary, we create a document that can easily be transformed into something readable by people (e.g., HTML) and something that can easily be compiled into a procedure that acomplishes the goal (e.g., a Perl module). The resulting XML vocabulary will need to borrow heavily from <a href="http://en.wikipedia.org/wiki/Functional_programming">functional programming</a> in the way it approaches the problem of describing solutions. Functional programming thinks in terms of data transformation instead of sequential instructions. All computing is simply transforming data. Examples of functional languages include <a href="http://en.wikipedia.org/wiki/LISP">LISP</a>, <a href="http://en.wikipedia.org/wiki/Haskell_programming_language">Haskell</a>, and <a href="http://en.wikipedia.org/wiki/XSLT">XSLT</a>.</p>

<p>All applications basically take input data, apply some transformations, and provide output data. Some are fancier in how they do this, but everything can be broken down into that basic form. Desktop applications, web applications, servers. It doesn't matter what role the application plays, it still takes input, transforms it in some way, and provides output. Some applications even change the transformations based on past input. All applications are state machines.</p>

<p>Since all applications are state machines, we can define the root XML element for our programs: <code>&lt;state-machine/&gt;</code>. Of course, state machines have states. Our next element is then <code>&lt;state/&gt;</code>. We'll want to give it an identifier so we can refer to it, so we will add an id attribute. We should be able to add an id to a <code>&lt;state-machine/&gt;</code> as well so we can reference it from other <code>&lt;state-machine/&gt;</code>s.We want to be able to add data transformations to the state machine. Otherwise, the state machine doesn't really do anything. We can do this with a <code>&lt;transform/&gt;</code> element. Sometimes, we need to apply a transformation to an empty value to initialize a program. We need to specify when certain transformations should happen, so we will add the <code>&lt;init/&gt;</code> and <code>&lt;clean-up/&gt;</code> elements to act as special <code>&lt;transform/&gt;</code>s.</p>

<p><textarea name="code" class="xml" cols='60'>
<state-machine  xmlns:wf="urn:for-work-flows"  xmlns:authz="urn:for-authorization">
  <init match="/requests"
        select="wf:find('/example/requests',
                        'PENDING' | 'APPROVED' | 'DENIED',
                        authz:actor())">
    <value name="id" select="./id"/>
    <value name="state" select="./state"/>
    <value name="list" select="wf:context-params(.)/list/name"/>
    <value name="last_update" select="./last_update"/>
  </init>
</state-machine>
</textarea></p>

<p>This will initialize the state machine with a list of work flows based on the <code>/example/requests</code> work flow definition that are in the <code>PENDING</code>, <code>APPROVED</code>, or <code>DENIED</code> state and owned by the person logged in to the website. It will transform the list of work flows into a list of associations, one association for each work flow. Each association will have four key/value pairs: the id of the work flow, its state, the name of the requested list (e.g., an e-mail list), and the date the work flow was last updated. This list will be available as <code>/requests</code> in any subsequent processing (e.g., a Template Toolkit view might access the list as <code>requests[]</code>).</p>

<p>In an implementation, the contents of the <code>&lt;init/&gt;</code> can be lazily evaluated. For example, in Perl, the <code>&lt;init/&gt;</code> can be stored as an anonymous subroutine that is called when the value of <code>/requests</code> is actually needed. Because of this, we can think of the <code>&lt;init/&gt;</code> as being applied when we use a value that matches <code>/requests</code>. This leads to an expectation that arbitrary XPath-like expressions can be matched against. The <code>&lt;init/&gt;</code> thus provides a default value for any data element that matches the match attribute.</p>

<p>Continuing this line of thinking, the <code>&lt;init/&gt;</code> and similar container elements don't really contain scripts that are run, but sets of rules that are applied to various bits of data at the appropriate time. The <code>&lt;init/&gt;</code> and similar elements act as scopes for the data transformation rules. If we are applying data transformation rules instead of steps in a script, then we can make use of the parallelism that functional programming allows. The rules are independent of each other and have no side effects. We can compress the rules by looking for common phrases and computing them once when needed.</p>

<p>Next time, we'll look at how states in the state machine can be specified and how flow through the state machine can be managed.</p>
]]>
    </content>
</entry>

</feed>
