Index from word list

Sidelining InDesign's index feature, the script creates an index on the basis of a word list, adding page references in that word list. It runs on all opened documents but it doesn't change the documents in any way. In its approach (avoiding Indesign's index) it is comparable to Marc Autret's index brutal.

Use

The script has no interface. Open all documents that should be included, then open the document with the word list and run the script. Page numbers are added to the entries in the word list.

The word list

The word list allows some flexibility in how the script searches your documents. By default, the script searches case-sensitively, whole words only. An item like this:

claim

finds just claim, not Claim, claims, disclaimer, claimed, etc. To search case-insensitively, open the script and change false to true in this line:

case_sensitive = false;

By design the script considers an entry only up to the first comma or parenthesis. In this way the script can be used to create author indexes from a list that includes first names or initials. For instance, if the word list contains this line:

Leech, G.

Only Leech is used for the search: after all, a text is more likely to contain just an author's surname than their surname followed by their initials or first name. Similarly, when the word list has a line like this:

abomination (see also terribleness)

you needn't worry about the cross reference: the script considers just abomination.

More flexible searching

As mentioned earlier, the script searches for whole words. To find different forms of the same word, include all forms in the list:

claim
claims
claimed
claiming

This does mean, though, that the resultant index needs to be post-edited -- but this should be done to a generated index anyway. But the script's strict search can be relaxed a bit by using certain wildcards in the list. So instead of an item like claim, you could write entries in several other ways. Some examples:

claims?

The question mark indicates that the preceding character is optional, so this expression finds claim and claims. The scope of the question mark is just one character.

claim(s|ed|ing)?

Options are separated by pipes (|) and can be grouped by wrapping them in parentheses. Grouping creates a scope island, so to speak, therefore the question mark has scope over all options in the group. The expression finds claim, claims, claimed, and claiming.

claim\w*

\w stands for 'any word character', which covers letters, digits, and the underscore character. The star stands for 'zero or more'. In addition to the forms found by the expression above, claim\w* therefore also finds words like claimant, claimer, claimable, and claimants.

\w*claim\w*

Apart from all the forms of 'claim' we found earlier, this expression finds forms of 'claim' preceded by any letters, such as disclaimer.

The wildcards illustrated here are indeed GREP classes. The script in fact uses InDesign's GREP search so that you can use many valid GREP expression in the script (I'd like to say 'Any valid GREP expression' but I've not tried that many.) For more information on InDesign's GREP, see here.

Page numbers

The script defaults to ranging consecutive page numbers. The page numbers 1, 2, 3, 4, 6, 7, 8 are represented as 1-4, 6-8. To disable ranging, look for this line:

page_span = 1;

and change 1 to 0. You could also relax page ranging by using a page-span value of 2: 1, 2, 3, 4, 6, 7, 8 will then be represented as 1-8.

Restricting the script's scope

When you're creating an index, there are probably a number of areas that you want to exclude from the index. Typically, indexes shouldn't refer to such items as bibliographies, quotations, and chapter titles. Some publishers want to exclude tables as well. To allow for this, the script considers only words found in certain paragraphs. This line:

include_paragraph_styles = '|def|tab|not|sec';

Tells the script to look only in paragraphs whose name begins with def, tab, not, or sec. This could include, say, default, default-ES, default-FR, secA, secB, etc. etc. To disable this selectivity and let the script look in all paragraph styles, change this line as follows:

include_paragraph_styles = "";

Topic separator

You can choose which character or characters should be used between a topic and the first page number. The script defaults to an en space. To change that, look for this line:

topic_page_separator = '\u2002';

To use, say, a comma and a space, change the line to this:

topic_page_separator = ', ';

Show script (right-click, Save Link/Target As to download)

Back to the main page on indexing

Back to the main script page

Installing and running scripts

Editing a script

Questions, comments? Get in touch