Search for Foreign Language Characters in Text

InDesign offers very fine grained control over find/change, including which sorts of characters you want it to find.

Franck wrote us, asking if there was any good way to find just the Japanese characters in an InDesign story. How can you find just those characters? Or just Russian (Cyrillic), or just ornaments, or — for that matter — search for only latin characters? The answer, of course, is the GREP tab of the Find/Change dialog box (CS3 and later).

Finding only Latin Characters

To find just the latin characters, ignoring any special characters, punctuation, numbers, and so on, you could type

[a-z|A-Z]+

into the Find What field of the GREP tab of the Find/Change dialog box. Both the vertical pipe character and the square brackets act as “or” commands, so this means “any character between a and z or between A and Z” (then the plus symbol means a string of one or more of them).

It will not find accented characters because those characters don’t strictly fall between a and z in the unicode lists. It’s all based on unicode numbers, as we’ll see later on.

If you want to find longer strings, including spaces, most punctuation, numbers, and so on, you might use this long list, which offers even more characters inside the “or” square brackets:

[.,;:?!\d a-z|A-Z]+

You can also turn this code around and say “any character that is not in this list” by adding a ^ (caret) symbol at the beginning:

[^.,;:?!\d a-z|A-Z]+

which would find all the non-English characters, such as accents, ornaments, cyrillic, and so on.

Searching for Unicode Ranges

As I mentioned earlier, find/change is all based on unicode values. For example, capital A is 0041, capital Z is 005A, and so on. (Unicode values are based on four hexadecimal numbers, which is a fancy way of saying that each number can be 0-9 or A-F.) So if you know the range of the unicode values, you can really dial the GREP in to exactly what you’re looking for.

For example, here’s some GREP to find all Japanese characters:

([\x{4E00}-\x{9FBF}]|[\x{3040}-\x{309F}]|[\x{30A0}-\x{30FF}])+

It seems complex at first, but after a moment you’ll notice that this is simply a list of three ranges, separated by vertical pipes. Technically, it means, all the characters that fall into the kanji section of Unicode (4E00 to 9FBF) or all the hiragana (3040 to 309F) or all the katakana. (I found those on the Japanese writing system page at wikipedia.)

You could use a similar method to find characters in Bengali (0980-09FF) or just math operators (2200-22FF) or whatever you’d like. I found these and many other unicode ranges at unicode.com/charts.

Applying a Different Font to These Characters

Of course, once you have found the characters, you probably want to do something with them — such as change their font or apply a character style. You can also do that in the Find/Change dialog box:

Applying a character style to Japanese text with GREP

Note that the Change To field is empty. This indicates that you want to leave the text alone — whatever InDesign finds — and only apply the formatting to it.

Bookmark
Please login to bookmark Close

This article was last modified on December 20, 2021

Comments (29)

Leave a Reply

Your email address will not be published. Required fields are marked *

Loading comments...