XML Transformation Creation

| 0 Comments | 0 TrackBacks

I was looking around the web for references about EAD, an XML vocabulary mentioned in a Digital Humanities Working Group meeting Monday. I could see cases where people would want to have documents marked up with both TEI and EAD.

XSLTs basically describe a function that is applied to an XML document resulting in another document (not necessarily XML): D = f(X) where X is a subset of D (for a particular document, I'd say: d = f(x)). We usually are given X and f and asked for D, but I'm wondering if we could be given D and X and find f.

This is definitely a pure computer science problem, but it has digital humanities applications.

A web search shows some work in this direction, but usually having people manually map elements between the two document sets to generate the XSLT.

Another thing that would come from this is a way to rank XML vocabularies based on their expressive range. If we have two sets of documents (A and B) based on two different XML vocabularies, then if an XSLT exists that maps A -> B, but no XSLT exists that maps B -> A, then the vocabulary for A could be seen as having a larger expressive range than that used for B. That would let us have a more solid foundation for saying that TEI is more expressive than Docbook (which I believe it is, but don't have good data to base that belief on at the moment).

I can manually create XSLTs to go from TEI to Docbook to HTML because I believe there's a loss of information from one format to the next (ignoring the pushing of that information into CSS at the final HTML stage) and because Docbook is a publishing vocabulary and HTML is, with CSS, a de facto typesetting vocabulary. The information isn't so much lost as transformed from semantic to presentation, with the person reading the resulting document adding back the semantic information based on the presentation. The semantic information though is removed from a readily computer-understood form: it's gone from a context-free to a context-dependent form.

No TrackBacks

TrackBack URL: http://www.jamesgottlieb.com/cgi-bin/mt-tb.cgi/15

Leave a comment

Recent Entries

Fabulator Design: RDF Operations
If you’re familiar with MVC design, or at least with Ruby on Rails, then you’ve heard of CRUD: create, read,…
Fabulator: the Future of Gestinanna
In a post from quite a while back, I talk about eXtensible State Machines, a way to reduce a web…
The Church of Latter Day Scholars
Michael Godwin, General Counsel, Wikipedia Foundation, is on campus today visiting with various digital humanities groups and giving a talk…