Scholarly Software Editions

The NEH and other U.S. federal government agencies are pushing the digital humanities projects to result in something that can be shared. If this is an application that people can use, especially an application that resides on a central server, then the NEH is also wanting provisions for long-term maintenance. Ultimately, digital humanities projects should seek to be a resource that other scholarly work can build on. In this post, I want to explore what this might mean for web-based applications.

We need to balance the control an author has over their work and the interest in the wider academic world in having access to that work. We need to balance the pressure of evolution of a project with the need to have a stable representation that other work can reference.

I believe the way to do this is to introduce the idea of the project edition. This mirrors how reference works are produced in print media. Libraries and journals already have developed a system for working with such editions.

After an edition is printed, the publisher will sometimes produce an errata listing all of the errors corrected. Some of these corrections will be made for the next printing. Editions are usually produced when the changes are so large that the work is substantially different and shouldn't serve as a substitute for the earlier version.

Editions help researchers communicate on which version of a work they are building their own work. This helps other researchers verify their work and possibly reproduce it. If the foundation is constantly shifting, then reproducibility can't be reliably verified.

Trying to build an edition of a web application shows that such an application consists of two parts: the data on which the application is built, and the interface that allows people to explore that data.

To preserve reproducibility, we need to keep changes to a minimum and publish what they are. For the interface, this means that new features should be saved for new editions. New data can be added to the project without breaking reproducibility if the interface has a way to restrict its data at the user's discretion to what was available at an earlier date. We also need to publish when data has been added or changed.

Another property of printed material is that when a new edition comes out or the author moves to another institution, the older editions already in a library's collection remain. Once a work has been made available for others to build on, it remains available.

For digital works, continued availability requires a permanent home for a particular edition. This requires commitment from both the institution and the author. Institutions need to provide the infrastructure on which an edition could run, consisting of the server platforms and the URL namespaces, and authors need to grant the institution a license to run the edition in perpetuity with permission to make sufficient modifications to keep it running (just asBeowulf requires translation before the story is accessible to most English speakers today).

Such a license does not restrict future work of the author. Future editions of the application and data can run anywhere the author chooses. By granting such a license though, the author ensures that they will not be the reason that their work won't be found if referenced by another project.

In summary:

  • Digital humanities projects need to follow a versioning model that produces discrete editions that can be referenced.
  • These projects need to publish a list of any changes that are made to the published editions of the interface and data.
  • Institutions need to make the infrastructure commitment that allows for long-term maintenance of these projects.
  • Authors need to make the licensing commitment that allows for long-term maintenance of these projects.