XML Transformation Creation

I was looking around the web for references about EAD, an XML vocabulary mentioned in a Digital Humanities Working Group meeting Monday. I could see cases where people would want to have documents marked up with both TEI and EAD.

XSLTs basically describe a function that is applied to an XML document resulting in another document (not necessarily XML): D = f(X) where X is a subset of D (for a particular document, I’d say: d = f(x)). We usually are given X and f and asked for D, but I’m wondering if we could be given D and X and find f

This is definitely a pure computer science problem, but it has digital humanities applications. A web search shows some work in this direction, but usually having people manually map elements between the two document sets to generate the XSLT.

Another thing that would come from this is a way to rank XML vocabularies based on their expressive range. If we have two sets of documents (A and B) based on two different XML vocabularies, then if an XSLT exists that maps A -> B, but no XSLT exists that maps B -> A, then the vocabulary for A could be seen as having a larger expressive range than that used for B. That would let us have a more solid foundation for saying that TEI is more expressive than Docbook (which I believe it is, but don’t have good data to base that belief on at the moment).

I can manually create XSLTs to go from TEI to Docbook to HTML because I believe there’s a loss of information from one format to the next (ignoring the pushing of that information into CSS at the final HTML stage) and because Docbook is a publishing vocabulary and HTML is, with CSS, a de facto typesetting vocabulary. The information isn’t so much lost as transformed from semantic to presentation, with the person reading the resulting document adding back the semantic information based on the presentation. The semantic information though is removed from a readily computer-understood form: it’s gone from a context-free to a context-dependent form.

Published by


James is a software developer and self-published author. He received his B.S. in Math and Physics and his M.A. in English from Texas A&M University. After spending almost two decades in academia, he now works in the Washington, DC, start up world.