GREP Find/Change on Formatted Text (solution to a big problem)

You know I love GREP in InDesign. It’s an incredibly powerful way to format or find or change a lot of text very quickly. However, there is one significant problem with GREP that people rarely talk about… and if you aren’t aware of it, the problem can really bite you badly. I want to explain the problem and then share a couple of possible solutions.

The problem involves using the GREP tab of the Find/Change dialog box to alter text that already has formatting applied to it. For example, let’s look at a simple list, where the family names are tagged with a character style:

list of names

Let’s say that we want to swap the names, so the first name comes at the beginning of each paragraph. So we open Find/Change and use a relatively simple GREP expression to find the first word followed by a comma, followed by another word:

reverse name order

I should point out that this is just a simple approach… see this article for more on swapping names.

Anyway, when we click Change All, we find a couple of problems:

problem with grep and find/change

First, the comma still separates the names. That’s because I accidentally typed the comma in the Change To field. Oops! That’s easy to fix.

But the bigger problem is the formatting. See how the formatting stayed at the beginning of the paragraph? On the 3rd line, the original had 10 characters set to the bold style… and after the Find/Change, the first 10 characters are still bold — but it’s the wrong 10 characters!

This is the problem with GREP Find/Change on formatted text. And obviously it can cause huge mistakes!

In general, you want to do all your text clean up (with find/change, etc) before you apply formatting. But what if you need to use GREP Find/Change on formatted text?

Two Solutions

The first solution to this problem is to export the story in the “InDesign Tagged Text” format (using File > Export). When you do that, you get a text file that you can open in any text editor:

exporting indesign tagged text

The formatting is all applied using tags. However, this is different than markup (such as XML or HTML) where there is generally a beginning and ending tag. It uses different rules.

(Note that the tagged text file above is exported using “Verbose” tags… that’s one of the options you’ll see when you export the story. This setting makes it easier for humans to read.)

You can pretty much understand what’s going on. For example, on line 5 above, there’s a paragraph style called “names” applied to the paragraph, and then a character style called “lastName” applied to one word, and then the character style is turned off (disabled) for the rest of the paragraph.

So if you’re using a text editor that understands GREP (such as BBEdit on the Mac), then you can easily work with this text. Note, however, that different programs use slightly different GREP commands. For example, where we’d write $2 in InDesign (to replace with the second string of text it found), BBEdit uses the code \2.

Here’s the Find dialog box from BBEdit, searching for the whole paragraph and then rearranging it with the proper formatting tags:

using bbedit's grep commands

You may notice in the Replace field there are some other strange GREP tags: \L and \E. These are tags that, sadly, InDesign does not support at all. The code \L means “start converting text to lowercase,” and \E means stop the conversion. So the result is that all the characters in the family name (except the first one) are converted to lowercase. Here’s how it appears after we press Replace All:

results in bbedit

Gotta’ admit that’s pretty awesome, eh? But of course, it’s even better once we save the new tagged text file and import it into InDesign, replacing the original text:

properly formatted text

Tagged text is one method for managing already-formatted text with GREP. I showed this method recently at The InDesign Conference, and Peter Kahrel (who was lurking in the back of the room) later said he prefers a different method that doesn’t require exporting tagged text.

His method is to convert all the formatted text into tags inside the InDesign document itself! For example, italic text could be changed (by Find/Change or with a script) to be unformatted and with tags around it, such as this:

changing formatted text to tags

Once there are tags around it, you can do the GREP Find/Change in InDesign itself. With luck we’ll convince Peter to write up this technique in a future article for InDesign Magazine!

Now you know what to watch out for, why it’s important, and some ways to possibly work around this limitation.

Mike Rankin says:

June 18, 2018 at 11:16 am

I showed a use for Peter’s technique in this Lynda.com video on preserving formatting in index entries: https://www.lynda.com/InCopy-tutorials/Preserving-formatting-index/179050/363431-4.html

Luis Felipe Corullon says:

June 18, 2018 at 4:10 pm

Great article, David. Thanks!

Harison Silva says:

June 18, 2018 at 9:00 pm

Interessante! Obrigado.

Jens says:

June 19, 2018 at 9:42 pm

Am I missing something here?

Why not simply
– search for the character style (or the bold formating only, if no styles were used) and replace it with none
– use a GREP oder nested stlye in the paragraph style to assign the bold character style for the second word
?

greetings
Jens

Pascal says:

June 20, 2018 at 8:17 am

Yes,
I’m with Jens.
Why so complicated?

David Blatner says:

June 20, 2018 at 11:17 am

Yes, I agree that in this particular very simple example, a nested style would make much more sense. The point of the article is to show the bigger problem. There are many examples of formatting (both manual formatting or character styles) that will cause problems for GREP.

For example, if you have book titles in an italic style in the middle of many paragraphs. Then you run a find/change GREP query… the italic may become applied to the wrong words in the paragraph!

Frank Butler says:

June 20, 2018 at 5:10 pm

This article seems to be written with an assumption that the reader is familiar with the mechanisms of exporting and importing “InDesign Tagged Text” which I (as of yet) do not. I just played around with this for a bit and I was able to figure out that I have to select a block of text for this option to appear in the dropdown of File > Export, and I was able to export and modify the text I selected with my text editor, but I can’t figure out how to import it back into InDesign. Can anybody help fill in my knowledge gap here? Is this something simple I’m just missing, or maybe somebody could point me to something that gives an overview of this feature? (a quick web search did not yield anything helpful). I see a lot of possibilities with this approach for all sorts of things as I do have some experience with text editing using regexps. Thanks!

Luis Felipe Corullon says:

June 20, 2018 at 5:30 pm

@Frank, just do a “place” in the document. Ctrl+D (Cmd+D) and select the tagget txt you want to place/re-place.

June 21, 2018 at 6:53 am

Thanks Luis! – this worked – except that the first time I tried I got an error message when placing that basically said there was a missing “>” in the text (details below).

I tried again and got it to work by selecting a simpler chunk of text, and a little trial and error implicated a cross-reference in the original text that seems to be the culprit.

Is this Tagged Text thing buggy? I saw some things when I was searching online for more info that indicated it might be (one by David Blatner in fact from a few years ago that said something about Tagged Text not being supported well in later versions of InDesign, if I recall correctly). Does anybody know anything more about this, or can point me to anything online where I can find more information? Be good to know where the landmines are as I delve in further…

This is COOL STUFF!… B^)

Here’s the error message I got above for reference:
:- 8 :- The tagged text import plug-in was expecting a matching closing tag symbol “>” but found none. Either an extra opening tag symbol “” is missing. :-Hyperlink

David Blatner says:

June 22, 2018 at 9:17 am

Frank: You are correct that tagged text can be a bit buggy. I haven’t tried it recently with things like x-refs or endnotes, so I’m not sure how well-supported it is.

By the way, you don’t have to select text to export it. You can just have the text cursor flashing in a story (which means export the whole story). That works for all the text-export Formats in File > Export.

You’ll notice that when you export the text at tagged text, you have a choice of verbose or concise. I wonder if some of the tag problems might be due to one of those or another?

- Frank Butler says:
  
  June 22, 2018 at 1:32 pm
  
  Thanks David for your comments, very helpful!

Chris says:

June 22, 2018 at 2:09 am

Alternative method for this particular problem:
Convert text to table, using the comma as column delimiter.
Copy-paste the columns into the opposite order.
Convert table to text, using space as column delimiter.

Cheerio,
Chris “everything’s a table” Thompson

MarkinBoone says:

November 26, 2018 at 1:06 pm

Great solution, but you’ll have to deal with the comma left before each first name (when comma is used as column separator initially). A TEXT Find: ^p Change to: ^p will do it except for the first line. Replacing the spaces in the firstname table column before converting to text won’t work if there are double names, like Mary Ann.

Ann says:

June 24, 2018 at 3:27 am

Hooray for Peter Kahrel — brilliant, but brilliant, and very useful scripts. Many thanks Peter.

Bart Van Put says:

July 17, 2018 at 12:24 am

Hoera (dutch) for Chris’s and Peter contributions !

MarkinBoone says:

November 26, 2018 at 2:04 pm

My solution has a few extra steps, but it doesn’t require exporting or pasting replacement text and works for double or hyphenated names, whether first or last, like these examples:

BLATNER, David Michael
GAMET-SMITH, Erica
CONCEPCION, Anne-Marie
RANKIN BASS, Mike
CASE, Justin

1. Create a Character Style for a complex format as in the original example which has a different font with bold applied. (optional)

2. Clear formatting for all the target text; that is, set the formatting to whatever applies to the text for the first names.

3. Replace the spaces and hyphens between double names with strings of letters that can be replaced later (and will not be part of anyone’s name*):

TEXT
Find: –
Change: ZZZ

GREP
Find: ([\l\u])( )(\u)
Change: $1XXX$3

Because the GREP Find above looks for (a letter)(a space)(a letter), the space after the comma is ignored for each name.

4. Use GREP to swap first and last names and delete comma:

Find: (\w+?), (\w+)
Change to: $2 $1

5. Apply the style to the last names with GREP. What’s shown as is an actual single space typed in the field:

Find: (\w+)
Change: $1
Change Format: apply the Character Style created in step 1 or specific style settings in the dialog

6. Fix double names with TEXT:

Find: XXX
Change:

Find: ZZZ
Change: –

If you do something like this routinely, save the TEXT and GREP steps. I use numbers in the names so I can easily run them in order, or use a script that saves the sequence, like Mikhail Ivanyushin’s DoQueryList.

* Using strings to temporarily replace – with ZZZ and a space with XXX is not foolproof: if the name that comes before the hyphen or space ends in the letter Z or X, step 6 would not work: Araz-Matthews would end up Ara-ZMatthews. But a quick Find with the letters you plan to use beforehand can ensure success.

Ruo Pu Koh says:

December 9, 2018 at 9:32 pm

This could also be achieved with paragraph GREP style

Rob B Williams says:

May 28, 2019 at 8:48 pm

I’m hoping someone might be able to help me with this problem I’m having.

I’m trying to write a couple of expressions and am running into a big problem.

(RUNNING THIS GREP EXPRESSION SEEMS TO DESTROY HOW MY GREP WORKS FROM THE POINT OF RUNNING THE EXPRESSION . . . )

Btw, I’m guessing these expressions are way longer than they should be—I’m new and haven’t learned all the ins and outs fo GREP yet.

One expression that’s giving me trouble is this:
(\. |\? |\! |\, |\; |\: )(\l|\l\l|\l\l\l|\u|\u\l|\u\l\l|\u’\l)( )
I search for the above and replace it with (found1)(found2) and a non-breaking space.

(I’m basically looking to force short line endings onto the next page. endings such as the following:

word, a

(and the “a” is the last character on a line of text before it drops to the next line of text). The same goes for words such as “in” “and” “I” “I’m”

word, in
word, and
word, I
word, I’m

Any words of advice?

Thanks!

David Blatner says:

May 29, 2019 at 11:20 am

Rob, it would be much easier to make a character style that applies the “No Break” formatting, and then apply it with a grep style to something like \b\w{1,2}\s
That looks for any 1 or 2 letter words followed by a space.

There are other options, too, such as: https://creativepro.com/3-ways-to-fix-runts-in-your-text.php

- Rob Williams says:
  
  June 2, 2019 at 11:03 am
  
  Wow. That mad a huge dent in my workload!
  
  Thanks, David.
  
  Rob

May 29, 2019 at 2:22 pm

Thanks, David. That did the trick for 90% of my issues regarding line breaks.

I’m still having one BIG, but strange, little issue. GREP doesn’t seem to be working properly . . .

In running a very simple test . . . made of a few blank pages filled with Placeholder Text that I’ve sprinkled a few numbers and dates throughout.

(I’m simply searching for \d in Find/Change, under the GREP tab. Instead of finding the first or any occurrence of a digit, FIND/CHANGES says “Cannot find match.” Originally, it was not finding any the occurrences on p.1 and instead, jumped to a later page . . .

I decided to start from scratch with InDesign (by pressing Shift+Option+Command+Control, while starting up InDesign. Then I click Yes when asked if I want to delete preference files).

Starting fresh didn’t seem to fix the problem.

(This is only happening in InDesign 14.0.2. My 2018 (13.1) and 2017 (12.1.0.56) versions work just fine.)

David Blatner says:

May 29, 2019 at 2:55 pm

@Rob: I wonder if you are running into this bug: https://creativepro.com/grep-bug-fix-for-cc-2019.php

- Rob Williams says:
  
  June 2, 2019 at 11:02 am
  
  Fixed!
  
  Thanks, David. I appreciate your wisdom and willingness to share it!
  
  Rob

GREP Find/Change on Formatted Text (solution to a big problem)

Two Solutions

Recommended For You