Collect hyphenated words

The script on this page was originally written by the late Teus de Jong as a Delphi program that ran as a separate application in InDesign. Years later, in 2009, when he was finally convinced that JavaScripts had become quick enough, Teus and I ported his Delphi program to JavaScript. That version is still available at Teus's homepage. After we finished the JavaScript version we discovered that Martin Fischer (independently) had written a comparable script.

The script on this page is a modified version of the one Teus and I had written. There were some inconsistencies in the code, I wanted to add some features, and the interface could be improved. The core of the program, however, remains largely unchanged.

The script collects all words in a document hyphenated by InDesign and, optionally, all hyphenated words, that is, words that always contain a hyphen such as top-bottom and East-West. It then displays them in a window so that they can be processed interactively or outputs them to a new document. In each word, the break is shown by a selectable character such as a swung dash or a dash.

Optionally, words that can be found in the user dictionary can be filtered – after all, the hypenation of these words is correct by default. Clearly, the advantage of filtering out as many words as possible is that the list produced by the script is shorter and therefore easier to process.

Use

Open a document and start the script. It shows the dialog shown in the screenshot. At Output you select how the list with hyphenated words should be displayed: on screen or in a new document. You can choose a separator character at Separator (the choices are swung dash, en dash, em dash, hyphen, and underline).

collect hyphenated words

To include compound hyphenated words such as left-right and well-defined, tick the Include hyphenated words checkbox. To distinguish these from automatically hyphenated words in the displayed list, the hyphen is followed by the separator, as in left-~right.

At Use list from text file (disabled in the screenshot) you can select to use a text file with the document's hyphenated words that was created at an earlier session (see below). Check Sort list to, well, sort the output list. Check Filter out the exception list to ignore hyphenated words that can be found in the user dictionary: these are hyphenated correctly be default. (The last two checkboxes are mutually exclusive: checking one disables the other, because they cannot be applied both; both can be unselected at the same. We'll not go into the details here.)

The last three options refer to document output only. Set the number of columns for the output. Include details prints line numbers and environment (table, footnote, inline); an example is given below. When you output to a document you're usually not interested in duplicates, so you can choose to remove those.

After the script has collected all hyphenated words, the screen list is shown as in the screenshot below. The list is interactive: double-click a word and its document occurrence is selected and shown. You can correct any mistakes, but the list is not refreshed automatically.

collect hyphenated words

If the list is very long and you can't process it in one session, Save it on a text file. In a next session you can run the script and tell it to use the text file rather than create a new list: check Use list from text file. Loading that list is quicker than creating a new list, and the last selected item in the list is reselected after the list has been loaded. Press Close to close the window and end the script.

When you opted not to show any details, document output looks the same as the screen output: a list of words with the breakpoint indicated by the separator character.

When you did choose to include details, the script prints the word's page and line numbers, and if the word is not in the main text, the word's environment (table, footnote, or inline). Here is an example:

collect hyphenated words

 


Words that occur outside the main story are grouped together at the end of the list (to be fixed).

Batch-processing corrections

I use the script in conjunction with another one to apply corrected hyphenations; see hyphens_apply.jsx for details.

Version history

26 Feb. 2017: Sometimes the first page of a document wasn't reported correctly in the output document. Fixed.

8 Nov. 2016: The dialog's list display needed the Windows 10 fix. And fixed a problem with how the script recognised the InDesign version it's running in.

19 June 2016: Added the option to include compound hyphenated words such as left-right and north-south.

4 December 2012: Returned to the script that Teus de Jong and I wrote in 2009. The text on this page was entirely rewritten.

30 November 2012: Added options for filtering and sorting the list.

12 April 2011: Added the possibility to process just the active document or all open documents.

23 July 2010: The script now recognises user dictionaries in other languages than English.


Useful script? Saved you lots of time?

Consider making a donation. To make a donation, please press the button below. This is Paypal's payment system; you don't need a Paypal account to use it: you can use several types/brands of credit and debit card.

Peter Kahrel's paypal account

Show script (right click, Save Link/Target As to download)

Back to main hyphenation page

Back to script index

Installing and running scripts

Questions, comments? Get in touch