Language-aware paragraph sorter
Languages differ in the way that they treat accented letters when sorting lists. The script takes these differences into account by using the sort orders defined in a separate, editable, file (see below for details). The script sorts paragraphs (not tables); all formatting is respected. The script can also also create retrogade lists (words are sorted from the end of the word rather than from the beginning).
Use
- Place the script (sort.jsxbin) in your script directory. Place the file with sort orders (sortorders.txt) in the same directory.
- Select the text to be sorted. To sort a selection of paragraphs, select those paragraphs. To sort a whole story, select an insertion point in that story or select a text frame containing all or part of the story.
- Run the script. It shows this dialog:

The script tries to determine the currently selected language. If it can do that, the language shows in the dialog; if it can't, it displays [No Language]. To select a different language, pick it from the dropdown. (For changing and adding sort orders, see below.)
- Check "Save sort order" to save any changes you've made to a sort order or when you've added a sort order (see below for details).
- To remove duplicate items after the list has been sorted, keep "Delete duplicates" checked.
- Press OK to do the sort.
Download script --- Download sortorders.txt
Note: do a straight download: do not display the screen and copy & paste as you can do with most other scripts. This one is a jsxbin file that doesn't like to pass through a text editor.
Background
Languages divide into different types according to how they treat accented letters when sorting lists (if you see garbled characters in this text, enable Unicode/UTF-8 -- probably in View > Character Encoding or something similar; also select a Unicode font such as Lucida Sans Unicode or MIcrosoft's Times or Arial in the options section of your browser):
- Accented characters are grouped at the end of the alphabet. They are in effect considered as separate letters. This is the case in the Scandinavian languages. In Danish the sort order is ABC . . . XYZÆØÅ.
- Accented letters follow the unaccented ones, and these letters, too, are considered separate letters. Polish uses the sort order AĄBCĆ . . . XYZŹŻ.
- Accents are ignored. In German and Italian, for example, words are sorted as if the accents weren't there.
- Some letter combinations are treated as one character. In Czech, the combination "ch" is treated as "c", and in Spanish, "ll" is treated as a single "l" and "ch" is treated as "c"; "ch" is sorted after all other words starting with "c".
- Some languages mix two or more of these types; in Czech, some accented letters follow the unaccented ones, some accents are ignored, and, as mentioned earlier, "ch" is treated as "c". In Icelandic, Þ (thorn) is sorted at the end of the alphabet; Đ (D-bar) (which you may or may not consider an accented letter) follows D; other accents are ignored.
- A special, tricky, case is French, where the position of the second accented letter in a word determines how a word is sorted. For example, the words cote, coté, côte, and côté are sorted as shown in the first column, but should be ordered as in the second column:
Result Should be
-----------------
cote cote
coté côte
côte coté
côté côté
(Source: SortingAndCollating.pdf). The script doesn't handle these cases so French lists may need some manual post-ordering. I've no idea about the frequency of such cases -- it might not be a real problem. Follow the link, below, for detailed documentation.
- Finally, a completely neutral (or "diacritic-insensitive") sort order can be used to ignore each and every accent. This is useful, for instance, for sorting a name list (an address list, an index of authors) for an English-language publication in which several different types of accented letter are used. In such cases, all accents in Czech, Polish, Danish, etc. names are ignored.
The sorter for CS3 handles all these possibilities (except, as mentioned, some French cases). The script looks for a text file "sortorders.txt" which should be located in the script folder. An attempt is made to determine the currently selected language and to show its sort order (if the file can't be found the script defaults to [No Language] and diacritic-insensitive sort order):

The different types of sort order are accounted for by using a different format for each type of letter. All types are displayed in the screen shot, which shows the sort order for Czech (a different language can be picked from the dropdown). The formats are as follows:
- To have an accented letter sort after the unaccented one, it is positioned after it: RŘ specifies that R and Ř are different letters and that R immediately precedes Ř. Naturally, this also covers the Scandinavian languages: the accented letters are simply placed after all the other letters.
- To ignore an accent, the accented letter is placed in square brackets. A[] specifies that A and should be treated as the same letter. Any number of similar letters can be added to the list: E[ÉĚ] stipulates that E, , and Ě should be treated as the same letter.
- To treat a letter combination as a single letter, the combination is placed in curly brackets. {CH} indicates that "ch" should be treated as "c". The combination must immediately follow the letter with which it is equated, as in C{CH}. Only the letter following the opening bracket is considered, so the cluster can be any length. For example, if the combination "scz" should be treated as an "s", enter it as S{SCZ}.
- To neutralise all accents, pick [No Language] from the dropdown.
- Enter just capitals: the script handles lower-case letters automatically.
The example here shows some lines from the file "sortorders.txt", showing how sort orders are encoded.
<This file uses UTF-8 encoding>
Polish 0123456789 AĄBCĆDEĘFGHIJKLŁMNŃOÓPQRSŚTUÚVWXYZŹŻ
Czech 0123456789 A[Á]BC{CH}ČD[Ď]E[ÉĚ]FGHI[Í]JKLMN[Ň]O[Ó]PQRŘSŠT[Ť]U[ÚŮ]VWXY[Ý]ZŽ
Icelandic 0123456789 A[Á]BCDÐE[É]FGHI[Í]JKLMNO[Ó]PQRSTU[Ú]VWXY[Ý]ZÞÆÖ
[No Language] 0123456789 A[ÁÀÂÄÅĀĄĂÆ]BC[ÇĆČĊ]D[ĎĐ]E[ÉÈÊËĘĒĔĖĚ]FG
[ĢĜĞĠ]H[ĤĦ]I[ÍÌÎÏĪĨĬĮİ]J[ĵ]K[ķ]L[ŁĹĻĽ]MN[ÑŃŇŅŊ]O[ÓÒÔÖŌŎŐØŒ]
PQR[ŔŘŖ]S[ŚŠŜŞȘß]T[ŢȚŤŦ]ÞU[ÚÙÛÜŮŪŲŨŬŰŲ]VW[Ŵ]XY[ŸÝŶ]Z[ŹŻŽ]
Each line consists of two parts: the name of the language in InDesign's format, followed by a tab, followed by the sort order. The file is in UTF-8 format and must stay in that format.
(Note: the characters at [No Language] are on three lines for display purposes here only: they should remain on one line in the file.)
Adding and changing sort orders
It is easy to add a new sort order or to change an existing one. To change a sort order, pick the language and make any changes in the displayed string. Make sure that the "Save sort order" box is checked to save the changes. To add a new language, pick it from the dropdown and enter the sort order after "Sort order:". The new data are stored in the sort-order file. You can edit the file in a text editor, but remember to save it in UTF-8 format.
When changing or adding sort orders, bear two things in mind:
- Add letters as capitals only. The script will take care of all corresponding lower-case letters.
- If you omit a character, don't worry: it will just be sorted incorrectly, it will not disappear from your documents.
Further information
There's a lot of information on sorting. SortingAndCollating.pdf is a good general overview.
Like the script? A donation will be appreciated.
Questions, comments? Get in touch
Last updated 15 June 2008: enabled retrogade sorts; added Brazilian Portuguese to sortorder.txt (thanks to Igor Freiberger).