Concordances

Scripts that create indexes from word lists are indiscriminate: they index anything that matches a word's pattern. In many cases this leads to overgeneration of indexed words, for example when you're after a noun which can also be used as a verb. Take claim: a script indexes all occurrences of claim and probably claims, maybe also claimed and claiming. Suppose you're after the nominals (claim and claims). You're now faced with the task to weed out the page references to occurrences of the verbal uses.

When I was working on an index a while ago, I was faced with issues like this and decided that time invested in writing a concordance script was better spent than the hours spent poring over prints and documents. The concordancer proved very useful: it shows a word in its context so that, using our claim example, I could quickly see of each occurrence whether it was used as a noun or as a verb. Here is some sample output created by the script:

33  ways. In particular, we claim that the main import
34  discuss and refute Smith’s claim, and will present new
38  be exhaustive; however, he claims that its exhaustivity is
44  their exhaustive reading. He claims that the ‘exactly n’
49  has argued for the claim that the structural focus
50  for which precisely this claim has been made (cf.
53  property of topics. This claim is based on the
56  CP]. Evidence for this claim comes from the fact
68  focused phrase itself. Some claim that both patterns are
84  this is a controversial claim. We do, however, agree
87  not the strongest defensible claims consistent with the speaker’s
87  make stronger or weaker claims. More central to the
88  are not the strongest claim consistent with the speakers
The number at the beginning a line is the page number where the concordanced word occurs. I wanted references to the noun, singular or plural, so I needed to discard references to pages 33, 38, 44, 68 as in these instances claim was used in a verbal sense. The concordancer, then, made editing the index a lot easier.

Use

The script shows the following dialog:

concordance screenshot

At Search: you type the word for which you want to create the concordance (details below). If you select a word or phrase in the text before running the script, that selection is entered into the script's entry field automatically.

Words before and Words after indicate how many words should go before and after the concordanced word; check the Case sensitive box to concordance strictly case-sensitively; and if there are several open documents you can choose to process just the active document or all documents (this option is greyed out when there's just one open document). If you've used the script before, it loads all the settings from that session.

To include or exclude footnotes, check the Include footnotes box.

The script is very quick: it breezes through hundreds of pages in seconds.

The search expression

The script searches strictly for whole words. Searching for claim therefore finds just that, nothing else: claims, claimed, claiming, claimant – all these variants are ignored. To find other forms as well you should do separate concordances, but you can use some wildcards to make the search expression more flexible. Here are some examples:

claims?

The question mark indicates that the preceding character is optional, so this expression finds claim and claims. The scope of the question mark is just one character.

claim(s|ed|ing)?

Options are separated by pipes (|) and can be grouped by wrapping them in parentheses. Grouping creates a scope island, so to speak, therefore the question mark has scope over all options in the group. The expression finds claim, claims, claimed, and claiming.

claim\w*

\w stands for 'any word character', which covers letters, digits, and the underscore character. The asterisk stands for 'zero or more'. In addition to the forms found by the expression above, claim\w* therefore means '"'find claim followed by any number of letters (and digits)', so it also finds words like claimant, claimer, claimable, and claimants.

\w*claim\w*

Apart from all the forms of 'claim' we found earlier, this expression finds forms of 'claim' preceded by any letters, such as disclaimer.

The wildcards illustrated here are indeed GREP classes. The script in fact uses InDesign's GREP search so that you can use many valid GREP expression in the script (I'd like to say 'Any valid GREP expression' but I've not tried that many).

Restricting the search area

Concordances can get very long. You can try to make a concordance easier to use by constraining the search area to certain paragraph styles. At the top of the script you see two lines:

// ok_stylenames = '|def|sec|not';
ok_stylenames = "";

In this state (which is the default), the script includes all matches: it ignores the line headed by the two slashes so it doesn't check which paragraph style has been applied to the paragraph in which the word is found.

To narrow the script's search area down to paragraphs formatted with certain paragraph styles, list the names of the styles in the form shown in the example: the first three letters of a style's name preceded by a vertical bar. In this example, the script includes words only if they occur in styles whose name begin with def, sec, or not -- other styles are ignored. You can list as many styles as you like. To enable that line, remove the slashes and add them to the second line:

ok_stylenames = '|def|sec|not';
// ok_stylenames = "";

Version history

12 December 2014: fixed problem with roman page numbers.

19 June 2011: (1) Fixed problem with hyphenated words and possessives; (2) if some text is selected when the script is started, that text is entered into the Search: field; (3) it's now possible to ignore footnotes; (4) when the script finishes, it restores the settings of the Find/Change dialog.

27 August 2010: fixed problem with finding page numbers.

1 July 2010: updated script to CS5; fixed bug that prevented proper word-boundary detection.


Useful script? Saved you lots of time?

Consider making a donation. To make a donation, please press the button below. This is Paypal's payment system; you don't need a Paypal account to use it: you can use several types/brands of credit and debit card.

Peter Kahrel's paypal account

Show script (right-click, then Save Target/Link As to download)

Back to the main page on indexing

Installing and running scripts

Questions, comments? Get in touch