Index from word list

Sidelining InDesign's index feature, the script creates an index on the basis of a word list, adding page references in that word list. It runs on all opened documents but it doesn't change the documents in any way. In its approach (bypassing InDesign's index) it is comparable to Marc Autret's IndexMatic.

Use

Open all documents that should be included, then open the document with the word list and make sure that the word list is the active document. Start the script, which shows the window shown in the screenshot.

indesign index

Selecting paragraph styles

Items in some types of secondary text (bibliographies, quotations, etc.) are usually not included in an index. Because all this secondary text material will typically be formatted with certain paragraph styles, it's not difficult to filter out that material. You could also take a different perspective, and say that you want to include only text formatted with one or more styles for the main text.

Such choices are specified in the top left box. With the settings in the screen shot any paraphraphs set in the paragraph styles bibliography or quote are ignored.

To include only text formatted with with one or more styles, first move any style names from the box on the left to the one on the right (press Remove all), then move the styles you want to include from the right-hand box to the left-hand one.

To ignore this feature altogether and have the script consider all text, just leave the left-hand box empty.

Moving styles from one box to the other is straightforward. The document's paragraph styles are shown in the list on the right. If there aren't any, which will often be the case with lists that form the basis of an index, press Load styles to load all paragraph styles from the document in InDesign's next window. You can sort them if necessary by pressing Sort styles (this applies only to the right-hand list; the one on the left is always kept sorted by the script).

Copying paragraph styles into the word list

If the word list doesn't contain the paragraph styles present in your document files, click Load styles to copy all paragraph styles from a document file.

The four remaining buttons between the list boxes speak for themselves: Add selected and Add all move the selected styles or all styles from the right-hand box to the left-hand one; Remove selected and Remove all move styles in the opposite direction. You can select style names in the usual way using the mouse and the Shift and Cmd/Ctrl keys.

Index options

At Topic–page separator you pick which character or characters that should be used between topics and the first page reference. Options are en-space, normal space, and comma followed by a space.

Match topics case-sensitively speaks for itself.

At Range pages you indicate whether series of consecutive page numbers should be ranged (i.e. 1, 2, 3, 4, 5 rendered as 1-5). At Use you pick a range symbol (Hyphen or en-dash). Finally, at Tolerance you can indicate a degree of relaxation for the spanner. With 0, the spanner hyphenates only consecutive numbers. With a tolerance of 1, a single number can be skipped (1, 2, 3, 5 > 1-5). Some examples:

Range
Tolerance
Result
1, 2, 3, 4, 6, 8, 11, 14 0 1-4, 6, 8, 11, 14
1, 2, 3, 4, 6, 8, 11, 14 1 1-8, 11, 14
1, 2, 3, 4, 6, 8, 11, 14 2 1-14

To include section headings, check Add section headings. Letters are inserted at the beginning of each letter range.

To mark topics that could not be found in any of the open documents, check Mark topics without page references. The script uses strikethrough to mark unfound topics so you can easily identify them.

The script searches whole words only; see the section on the word list, below, for how this can be relaxed.

Search areas

The four check boxes on the right offer the familiar restrictions on where to search; these correspond with the icon buttons in the Find/Change dialog. ("Include master spreads" is not an option in the script.)

The word list

The word list allows some flexibility in how the script searches your documents. The script searches whole words only, so that an item like this:

claim

finds just claim, not claims, disclaimer, claimed, etc. By design the script considers an entry only up to the first comma or parenthesis. In this way the script can be used to create author indexes from a list that includes first names or initials. For instance, if the word list contains this line:

Leech, G.

only Leech is used for the search: after all, a text is more likely to contain just an author's surname than their surname followed by their initials or first name. Similarly, when the word list has a line like this:

abomination_(see also terribleness)

you needn't worry about the cross reference: the script considers just abomination. Note though that in order to ignore the opening parenthesis and everything that follows it, the parenthesis should be preceded by an underscore. (The reason is that parentheses have a special meaning in the script.)

More flexible searching

The script's strict whole-word-only approach can be relaxed in some ways. A few examples will make clear how this works – it's not particularly complicated.

The simplest way to find different forms of the same word is to include all forms in the list:

claim
claims
claimed
claiming

But there are some other possibilities: the script can use all the wildcards that can be used in InDesign's Find/Change dialog (the GREP tab). So instead of an item like claim, you could write entries in several other ways. Some examples:

claims?

The question mark indicates that the preceding character is optional, so this expression finds claim and claims. The scope of the question mark is just one character. But that restricted scope can be expanded as follows:

claim(s|ed|ing)?

Options are separated by pipes (|) and can be grouped by parentheses. Grouping creates a scope island, so to speak, therefore the question mark has scope over all options in the group. claim(s|ed|ing)? finds claim, claims, claimed, and claiming.

Another way to capture more word forms is the following:

claim\w*

\w stands for 'any word character', which covers letters, digits, and the underscore character. * stands for 'zero or more'. In addition to the forms found by the expression above, claim\w* therefore also finds claimant, claimants, claimer, claimable, etc. Add \w* at the beginning of the word, as follows:

\w*claim\w*

and you'll find all words that contain claim, such as reclaim and disclaimer.

The wildcards illustrated here are indeed GREP classes. The script in fact uses InDesign's GREP search so that you can use many valid GREP expression in the script.


Version history

30 July 2013: fixed an error in the representation of the paragraph styles in the document to be indexed.

29 June 2012: added the option to insert section headings in the index, so that letters are inserted before each letter range.

14 November 2010: fixed problem with the interpretation of parentheses.

24 October 2010: added interface and optimised the code here and there.

22 October 2010: fixed problem with finding topics in overset text in CS4.

27 August 2010: fixed problem with finding page numbers in CS5.

24 June 2010: fixed problem connected to text found on the pasteboard.


Useful script?

Consider making a donation. To make a donation, please press the button below. This is Paypal's payment system; you don't need a Paypal account to use it: you can use several types/brands of credit and debit card.

Peter Kahrel's paypal account

Show script (right-click, Save Link/Target As to download)

Back to the main page on indexing

Back to the main script page

Installing and running scripts

Editing a script

Questions, comments? Get in touch