Members Only

Cleaning Up White Space Problems in Word Files

Proven methods for removing messy white space characters that can cause trouble in InDesign

This article appears in Issue 84 of InDesign Magazine.

“We’ll email you the Microsoft Word file,” says the voice on the other end of the phone. “Management has approved everything, so we’re good to go with this report.” You reply “Excellent, thank you,” and await the arrival of the email with the DOCX attachments. With the design proposal for the report approved and the InDesign template ready for production, the turnaround time for the report should be quick, given the imminent arrival of the beautifully formatted Word document. Right? If only this were true. In real life, even if you supply your authors with a beautiful Word template with well-crafted styles, it’s still likely that you’ll have some serious cleanup work to do. That’s just the nature of the game. The author’s job is to write, to supply important or engaging ideas, and to express them in a way that’s appropriate for the intended reader. In short, the author focuses on the words. And as the designer or production artist, it’s your job to fit those words into a professionally-designed layout, quickly and efficiently. And for that, you have to eliminate any rogue white space characters. Fortunately, you’ve got several built-in or otherwise easily available tools to help you do this, and do it efficiently, consistently, and repeatedly (for all that return business you’re going to get because your work looks so good). This article will introduce you to these options, which are likely to become invaluable friends.

Common Cleanups

To create a complete list of any cleanups you might encounter when inserting supplied text into InDesign projects lies outside the scope of this article. Overall, the very first cleanup that generally takes place is that of applying the correct paragraph and character styles throughout a document. The following is a list of commonly seen issues in imported

text. Capital letters: Changing text typed with the Caps Lock key on or the Shift key pressed to title-, sentence- or lowercase Ellipses: Replacing three consecutive periods with an ellipsis character Hyphens: Replacing double hyphens with em dashes and hyphens in hyphenated compounds with a nonbreaking hyphen  Paragraph returns: Removing blank paragraph returns that appear between paragraphs Space characters: Replacing double spaces at the end of a sentence with a single space; deleting a trailing space at the end of a paragraph; deleting leading spaces at the start of a paragraph; replacing multiple spaces with a single tab Tabs: Replacing multiple tabs with a single tab for tabulated text, removing a leading tab at the start of a paragraph and formatting that paragraph with a paragraph style that includes a first-line indent

Starting the Spring Cleaning

Where do you start when that email with the DOCX attachment arrives? Personally, I begin by opening the files in Word. My initial objective is to assess how the text has been formatted and decide on the best plan of attack from there on (Figure 1). Here are the steps I typically follow when reviewing the client’s text files.

Figure 1: Planning the best import approach starts with a document review.

Viewing style names

Adding the Style area pane to a document view in Word makes it easier to see which paragraph styles are applied to the text. The process to do this is slightly different in various versions of Word for Mac and Windows. In Word 2016 for Mac, choose Word > Preferences. Click on View, and in the Show Window Elements settings, enter a value in the Style Area Width field. On Windows, open the document in Word; then click the File tab, and then Options. Click Advanced, and scroll down to the Display section in the Word Options dialog box.  In the Draft and Outline views field you can enter the preferred pane width amount, and click OK. Click the View tab, and select Draft from the Views section. The style names are now listed to the left of the paragraphs. To see hidden characters like spaces or tabs, click the Home tab, and then the pilcrow icon (¶). I do a little happy dance when the text is consistently formatted using styles. As part of the text import, I can map the Word styles straight to my InDesign styles on import. Many designers opt to clear all text formatting during import when encountering badly-styled documents. The downside of this approach is that all additional formatting (overrides) such as words in a paragraph set in italics or bold, or characters set as superscript or subscript, are lost. My personal preference is to always map styles, even if they might not be consistently mapped and retain style overrides, and on the odd occasion—when only the Normal style is used in a document—that could mean all my text is formatted as Body Copy throughout the document, because everything in Word was manually formatted (see Sample-No-Styles.docx in the sample files).

Mapping styles

Mapping Word styles to InDesign styles helps you speed up the text styling process. To map Word styles to InDesign styles (Figure 2): Choose File > Place, select Show Import Options in the Place dialog box, and double-click the DOCX file. In the Microsoft Word Import Options dialog box, select Preserve Styles And Formatting From Text And Tables and Customize Style Import, and then click Style Mapping. Select the correct InDesign style to map the Word styles to in the Style Mapping dialog box, and click OK. Click OK once more to place the text in the InDesign document.

Figure 2: Where possible, map Word styles to InDesign styles to speed up the formatting process.

Depending on how well the source file is styled, the results will vary (Figure 3). If you select some text in the InDesign document, you can see which styles are applied and whether there are any overrides (marked by the + to the right of the style name) that need fixing. To see what these overrides are, pause the cursor on the style name.

Figure 3: The import result in InDesign after style mapping the Sample-Styles.docx (left) and Sample-No-Styles.docx (right).

Retaining overrides

If you opt to clear all formatting when placing the Word text into InDesign, select the Preserve Local Overrides option in the Microsoft Word Import Options dialog box (Figure 4). This way you’ll retain all of the bold, italic, and other additional text formatting in the paragraphs, but it will mean you’ll have to reapply all the style formatting, such as heading styles, in InDesign. For smaller documents, that’s generally not an issue, but for longer documents, that could create a fair amount of extra work.

Figure 4: Clearing all formatting, even when preserving local overrides, also clears any styles applied to headings.

The order in which you clean up your document is important. Hold off on removing those double spaces, returns, or tab characters just yet… Take a moment to review the InDesign file and make some notes on where you could use these characters to apply the correct paragraph styles. For example, in the sample files for this article, multiple space characters appear at the start of the body copy paragraphs below the Sources heading. Other paragraphs contain a leading tab character. When I noticed that in my Word review, I opted to map the Normal text in the Word document to Body Copy FO because I thought it would be easier to apply the correct styles in InDesign if I was able to search for these characters at the start of their respective paragraphs.

Using the Style Override Highlighter

Did you know that InDesign has a feature that reveals local formatting at both the paragraph and character levels? At any time, just click the plus button above the list of styles in either the Paragraph or Character Styles panels. Local formatting of text in all open documents will be indicated by blue highlighting.

Find/Change

InDesign’s Find/Change command lies at the heart of any manuscript cleanup. It’s a powerhouse that you’ll quickly become best friends with when you’re formatting and cleaning up a document. A simple, but fundamentally important Find/Change for cleaning up manuscript is one that replaces local bold and italic with character styles (Figure 5).

Figure 5: A basic but crucial Find/Change operation to replace local bolding with a character style. Note that the both the Find What and Change To fields are empty. The same technique can be used to replace local italics with a character style.

Learning about GREP will allow you to be even more powerful with your cleanups. If you’re new to GREP, consider downloading the What is GREP? PDF I created in 2013 for the Adobe InDesign User Group. Additionally, review all the fantastic GREP resources on InDesignSecrets.com as well as in InDesign Magazine.

Finding leading spaces

  • Looking at the previous example, let’s see how to use Find/Change to locate leading spaces and apply a different paragraph style. You are searching for the following:
  • The space characters at the start of a paragraph or a tab (steps 4 and 5)
  • The first letter of the paragraph after the spaces (steps 6 and 7)
  • A paragraph that is formatted with the Body Copy FO paragraph style
  • And you want to:
  • Reinsert the first letter of the paragraph, but don’t reinsert the spaces preceding it (step 10).
  • Apply the Body Copy paragraph style.

To complete this task with Find/Change: 1. Choose Edit > Find/Change, and then click on the GREP tab. 2. Ensure there is no information in the Find What and Change To fields at the top of the dialog box. 3. Click More Options to show the Find Format and Change Format settings, and click the Clear Specified Attributes icon to remove any previously entered settings. 4. Click in the Find What field, and from the Special Characters For Search menu (), choose Locations > Beginning of Paragraph. This inserts a caret character (^) in the field, to which you can add a space character. (If you wanted to find the leading tabs, you’d add .) 5. Also from the Special Characters For Search menu, choose Repeat > One or More Times (a + is added). This ensures that all spaces or tabs found at the start of the paragraph are captured.  Next, you’ll need to ensure you also find the first character after the spaces or tabs. Because you want to reinsert this character (excluding the spaces and tabs), you’ll have to find this through a subexpression. 6. From the Special Characters For Search menu, choose Match > Marking Subexpression (a pair of opening and closing parentheses () is added). 7. Click inside the parentheses, and then choose Wildcards > Any Character. The complete Find What expression looks like ^ +(.). 8. Click in the Change To field. 9. From the Special Characters To Replace menu, choose Found > Found Text. $0 is added to the field. This ensures that the character caught with the subexpression is returned. 10. Click the Specify Attributes To Find icon, and in the Find Format Settings dialog box, choose Body Copy FO from the Paragraph Style menu (Figure 6). Click OK.

Figure 6: Isolating a Find/Change query to only look for text formatted with a paragraph style.

11. Click the Specify Attributes To Change icon, and in the Change Format Settings dialog box, choose Body Copy from the Paragraph Style menu. Click OK. 12. Click Change All. As you can see, setting up these Find/Change queries takes a bit of preparation. You wouldn’t want to reinvent the wheel each time you need to fix up a document.  Thankfully, you can save a Find/Change query and use it again for later documents. Additionally, the InDesign development team has already added some queries for you to use, available from the Query menu in the Find/Change dialog box (Figure 7).

Figure 7: InDesign’s default and saved Find/Change queries. Ready to use, right out of the “box.”

To save the query (Figure 8): 1. Click the Save Query icon. 2. Enter a name for the query in the Name field, and click OK.

Figure 8: Consider saving Find/Change queries so you can reuse them.

Why use GREP over text Find/Changes?

The advantage of GREP searches is that you have much more control. You could use a text Find/Change to find two spaces and replace them with a single space, but you’d likely need to run this command multiple times until no more double spaces are left. GREP, of course, will keep searching on its own until the job is done. GREP also contains a larger number of metacharacters (special characters) to select from the Special Characters For Search menu. Although InDesign can save Find/Change queries so you can reuse them, each query must be run individually, which can be time consuming. Additionally, for different jobs the queries to run and the order in which they run will likely vary.

About FindChangeByList

We’ve supplied an updated version of FindChangeByList.txt in the download file for this article, and recommend you replace the default file that is installed with the supplied file. David Blatner explains the issue with the default file in his article Big Problem with FindChangeByList (and an easy fix) on InDesignSecrets. For an in-depth rundown about the FindChangeByList script, download InDesign Magazine Issue 26 and review the article “The Secret Script.” Also see Bart Van de Wiele’s article A Major Job Gets Easier with GREP and FindChangeByList on InDesignSecrets.

FindChangeByList Script

There’s a script that allows you to run multiple Find/Change queries with a single instruction. The FindChangeByList script has been bundled with InDesign for as long as I can remember. It’s still installed on your system, ready to use in InDesign. When executed, the script applies all of the queries defined in an associated text file to the active document or a selection. It can apply text, GREP, or glyph changes. By default, the script performs the cleanups listed in Table 1, as defined in the FindChangeList.txt file (see the sidebar “About FindChangedByList” for important information about this file). To practice running the script: 1. Open FindChangeByList_Exercise.indd in InDesign, and choose File > Save As to save the file under a different name. 2. Double-click the FindChangeByList.jsx script in the Scripts panel, or click the script name and choose Run Script from the panel menu (Figure 9). Because the script automatically finds the FindChangeList.txt file, it will run the queries from this file on the fly.

Figure 9: Execute the FindChangeByList script to run multiple Find/Change queries.

Controlling the Find/Change queries

To access the script and review the syntax used by the FindChangeList.txt file, choose Window > Utilities > Scripts to display the Scripts panel. 1. Click the disclosure triangle to the left of the Application folder in the Scripts panel, and continue to expand the Samples and JavaScript folders to see the available scripts. 2. Below the FindChangeByList.jsx file, there is a FindChangeSupport folder. Right-click this folder, and choose Reveal in Finder (Mac) or Reveal in Explorer (Windows) to open the location of the FindChangeList.txt file that contains the instructions the script will use when run. 3. Open this file in a text editor, such as TextEdit (Mac) or Notepad (Windows), to review it.

Table 1: Default FindChangeByList operations

If the FindChangeList.txt file exists in the FindChangeSupport folder, the script runs the queries listed in the file. Each line of text in the file that starts with // is commented out and purely instructional and informative. Lines that start with grep or text perform a GREP or text Find/Change. Each of these lines is an individual Find/Change query. The query is broken up in several sections, separated by a tab character, for example findType<tab>findProperties<tab>changeProperties<tab>findChangeOptions<tab>description. To run the script using a different TXT file with commands, you must rename the original FindChangeList.txt file and create a new text file with queries. 1. Rename the file FindChangeList_Org.txt, and create a copy of the file. You can save this file anywhere on your system, for example inside a job folder. 2. Open up your new TXT file, and copy and paste some of the query lines, so that you can enter your own queries.  It can be helpful to look back at the actual Find/Change dialog box settings when you build these queries. For example, the text in the Find What field in the Find/Change dialog box must be typed between the quotes in {findWhat:””}. For the Change To text, do the same in {changeTo:””}.  3. Complete the query by entering the find and change settings. To control the scope of the search, set the findChangeOptions  (footnotes, include parent pages, include hidden layers, and whole word) to either true (yes) or false (no). Finally, add a description, so you’ll know later on what the query is meant to do. 4. Create a new line of text for each query; when done, save and close the text file. To do a test run with a custom text file, you can use the FindChangeByList_Exercise.txt file included with this article as an example for the FindChangeByList_Exercise.indd file cleanup (Figure 10):

Figure 10: Navigate to the text file that contains the query list.

1. Ensure you’ve renamed the original FindChangeList.txt file, and open up the FindChangeByList_Exercise.indd file. 2. Run the FindChangeByList.jsx script from the Scripts panel. 3. Navigate to the text file, select it, and then click Open.  The result is a pretty clean document, and a good start to continue working with. Although the FindChangeByList script lets you quickly apply a list of simple cleanups, if you want to use more difficult and complex queries, writing up the queries in the text file quickly becomes overwhelming. Just imagine if you wanted to find all the text that is formatted in Bold and is Body Copy, and then apply a character style to that text. That’s much easier to accomplish with InDesign’s built-in Find/Change, using the Find Format Settings (available when you expand the dialog box by clicking More Options) as shown earlier in Figure 5.

Multi-Find/Change

Multi-Find/Change is a low-cost plug-in for InDesign developed by Martinho da Gloria of Automatication. This plug-in works hand in hand with InDesign’s Find/Change, and allows you to easily load saved Find/Change queries into reusable sets. Sets can be shared between users. This plug-in also lets you fix capitalization errors, where text was typed in capital letters and should be title, sentence, or lowercase. You can download a trial version to review the plug-in.

Try Multi-Find/Change

To see Multi-Find/Change at work, download and install the trial version from the Automatication website, and run through the following exercise: 1. Open the MultiFindChange_Exercise.indd file. This document has a lot wrong with it: leading spaces, leading tabs, text formatted in capital letters, multiple tabs, and much more. Review the styles that have been applied to the text, but don’t make any changes to the document. 2. Choose Window > Multi-Find/Change. 3. In the Multi-Find/Change panel, click the Import Set icon, and navigate to the MultiFindChange_Set.xml file supplied with the download files for this article. Click Open (Figure 11). 4. Choose Document from the Search menu, and click Change All.

Figure 11: Loading a Multi-Find/Change set.

Over 50 changes are made in a second. With the Undo By Query option deselected, you can undo all changes with a single Command+Z (Mac) or Ctrl+Z (Windows). Naturally, setting up the queries took some time; however, consider the time savings when you can work with multiple sets for different jobs or repeating work. You can easily reuse queries previously saved, and share sets with colleagues working on similar projects. Double-clicking one of the queries in the GREP list lets you add a description of what a selected query is meant to do, or how it should be used. With a query selected, and the Find/Change dialog box open in InDesign, clicking the F/C icon loads the query into InDesign’s Find/Change dialog box. 

Creating Multi-Find/Change queries

Once you get the hang of using Multi-Find/Change, you can create all the Find/Change queries you want to use (Figure 12), and then run them via the Multi-Find/Change panel (Window > Multi-Find/Change). 

Figure 12: Creating and naming a new set in Multi-Find/Change.

1. Click the New Set icon to create and name a new set. 2. Double-click the set, and enter a Name and Description for the set in the Set Info dialog box. Click OK. 3. In the right column, click GREP to see the saved GREP queries. 4. Drag any queries you want to include in the set to the set, and sort the queries within the set by dragging them up or down (Figure 13).

Figure 13: Drag queries to the set to include them.

Adding capitalization changes

With a set created, you can make additional changes. In this example, the Fix Head 1 Caps query finds any text formatted with the Head 1 paragraph style and doesn’t make any changes to the text. Once the query is added to a Multi-Find/Change set, you can apply a case change to the text, similar to selecting text and choosing an option from the Type > Change Case submenu, only this time Multi-Find/Change will do the work for you on all those headings! To add a case change to a query in a set (Figure 14): 1. Double-click the query in the set. 2. In the Query Options dialog box, choose the desired case option from the Apply to Results menu, and click OK.

Figure 14: Fixing text that was set with the Caps Lock key enabled.

Before Multi-Find/Change was developed, the FindChangeByList script was my preferred tool to use for cleaning up text import. However, with its ease of use, and the fact that I don’t need to know anything about scripting or edit a text file to use it, and the very close integration with InDesign’s own Find/Change command, Multi-Find/Change is a favorite.

(Not Quite) The Finish Line: Clearing Overrides

Whichever cleanup workflow you end up choosing, it’s likely that there will be remnants left behind from the Word import, resulting in overrides—for example, font overrides or language overrides. With the remainder of your text cleaned up, clearing those final overrides is straightforward: 1. Select all the text in the document (Edit > Select All). 2. At the bottom of the Paragraph Styles panel, click the Clear Overrides icon (Figure 15).

Figure 15: Clearing the final overrides.

Of course, this isn’t the end of your workflow; it’s just the end of the beginning! But with a clean manuscript in place, you can begin the actual task of laying out your document. I hope that this article has given you some food for thought on how to make life a little easier when it comes to cleaning up those imported Word documents, so that you can spend more time on the fun part of working in InDesign: the design!

Bookmark
Please login to bookmark Close

Not a member yet?

Get unlimited access to articles and member-only resources with a CreativePro membership.

Become a Member

Comments (0)

Leave a Reply

Your email address will not be published. Required fields are marked *