I’m making good progress on the concordance front. I can now do the following:
c:concordance(“some text”)/some/count
This will convert the string to a concordance object (implicitly compiling the concordance) and give the the frequency/count of the word “some”.
My goal now is to get the “with ./*/foo := bar” fragment working with an internal Ruby representation of a concordance. This will allow me to annotate the words in a concordance. The internal object already preserves annotations when combining concordances.
Once I have a good serialization format for the concordance, I will be able to persist the concordance in some form — perhaps RDF, but not necessarily so. That combined with annotations of where a word appears will let me do searches of words:
c:concordance($doc)/foo/line
to give me the lines on which a word appears.
c:concordance($doc)/*[f:matches(node-name(.), “^f”]/line
to give me the lines on which a word appears beginning with the letter “f”.
At that point, the challenge will be to optimize these idioms so they don’t take forever to run.