Copying/Pasting Text from PDFs to InDesign

The other day I needed to copy a paragraph of text from a client-supplied PDF into an InDesign layout. Of course, I was in a hurry, and of course, the...

The other day I needed to copy a paragraph of text from a client-supplied PDF into an InDesign layout. Of course, I was in a hurry, and of course, the copy came in with a hard return at the end of every line. Don’t you hate it when that happens?

On the left, the selected text in Acrobat Pro 8, on the right, the pasted result in InDesign:

1-acro-copy.gif1-acro-paste2.gif

(To protect my client’s privacy, I’m using a different PDF for these screen shots. They’re from the Chicago Creative Coalition newsletter, a wonderful organization. You can download the PDFs from their Online Archives page.)

Obviously it’d be quick work to clean up those six lines in InDesign, but this was only the first of many different text selections I’d need to copy/paste from the PDF. Luckily, sometime in the recent past — don’t remember how or when — I picked up a nugget of information that allowed me to quickly fix the problem in Acrobat so that the pasted text came in properly (this one example and the others from the PDF), like so:

1-acro-goodpaste.gif

Tag the PDF

The answer is to make sure the PDF is “tagged” (made accessible to people with screen readers) before you copy text from it. How could I tell if my client’s PDF was tagged or not?

In Acrobat, a quick look at the PDF’s Document Properties dialog box (File > Properties, or Command/Control-D) told me that the PDF was not tagged. You can see that in the last line of this partial screen shot from the first panel (“Description”) of the dialog box:

1-acro-docprop.gif

I thought it was interesting that the PDF was exported from InDesign CS2 (note the info for Application and PDF Producer) but yet it wasn’t tagged, even though all it takes is a click on the Create Tagged PDF checkbox in InDesign’s PDF Export Options:

1-acro-exporttopdf.gif

I double-checked the PDF Export presets in InDesign CS3. Only the High Quality Print preset has Create Tagged PDF enabled. For all the other presets you’ll need to turn it on manually. Since tagging adds only a tiny amount of overhead to the PDF file size, and it has such huge benefits (not just for accessibilty, or to make it easier to extract text with Acrobat’s Select tool, but also for search engine indexing) I don’t understand why most of the presets have it disabled.

Luckily, you can add basic tagging to a PDF right in Acrobat Pro (not sure about Standard). In Acrobat Pro 8, choose Advanced > Accessibility > Add Tags to Document:

1-acro-addtags.gif

You’ll see a little progress bar appear letting you know it’s doing its thing, it doesn’t take too long at all. As soon as it’s done you can select text, copy it, and paste it into InDesign as one single paragraph. (Unfortunately, a side effect is that the copied text loses all paragraph returns, even the ones that should be there.) But that didn’t matter to me since I was just grabbing small chunks of text, and adding an occasional Return/Enter is easy.

YMMV (Your Mileage May Vary)

In my experience, using InDesign’s Create Tagged PDF or Acrobat’s Add Tags to Document commands do a “good enough” job, most of the time, to get rid of the end-of-line hard returns in text copied from the PDF. But using these commands is similar to converting a Microsoft Word document to HTML with Word’s own Save As HTML command — it gets you there, but it’s ugly. Creating accurate, 100% screen-reader-friendly tagged PDFs takes a lot more work than the automatic methods.

So, occasionally you’ll have some stubborn text that still breaks weirdly when pasted into InDesign, even though you copied it from a tagged PDF. If that happens and you just can’t stand the thought of hand-tweaking the pasted text, consider spending another five minutes or so in Acrobat creating your own content areas in the PDF. You can do that with the TouchUp Reading Order dialog box, found in the same Advanced > Accessibility fly-out menu:

1-acro-touchup.gif

The whole Reading Order thing is interesting and complex enough to merit its own article. But if you’re champing at the bit, the quick way to use it for our specific purpose (copying text without weirdo line breaks) is to click the Clear Page Structure button at the bottom of the dialog box, drag a selection rectangle around a partial or entire column of text, and then click the Text button at the upper-left of the dialog box. Do that for each column of text you need to pull from. Click the Close button, and now you should be able to copy and paste text selections into InDesign without a problem.

Bookmark
Please login to bookmark Close

This article was last modified on December 18, 2021

Comments (25)

Leave a Reply

Your email address will not be published. Required fields are marked *

Loading comments...