Concording bits and pieces

At a basic level, a concordance is an annotated word frequency chart.  Presentations and corpus might differ across projects, but any project with a concordance manages tables of words tied to frequencies and other information.

My current work is to set up an environment where we can say something like the following.

f:sum( for $line in $doc/* return c:concordance($line) with ./*/line = $line )

That might look a bit daunting, but let’s unpack it.

The outer function is stating that we want to accumulate concordances to build an overall concordance.  The sum of two concordances is a single concordance with combined counts and accumulated annotations (e.g., combined references to lines).

The ‘for $line in $doc/*’ is a way of going through every line in a document (in this case, a transcription in the Donne project).  This will depend on how concorded documents are managed and how fine grained you want your location tracking.  The ‘return …’ part just says that we want to build a list of things based on what we’re looping over (in this case, the ‘$line’).

The ‘c:concordance($line)’ builds a concordance object with word frequencies for the given ‘$line’.  We add a reference to the line with the ‘with ./*/line = $line’ part, which runs the expression against the concordance, adding a reference to the line to every word in the concordance.  The children of a concordance are the words in the concordance, so we end up adding the reference to each word.  The reference gets carried through the f:sum resulting in one large concordance with word frequencies and annotations of every line containing a particular word.