This article was originally published in InDesign Magazine issue 55 (August–September 2013). Subscribe now!
The future of publishing may rest upon HTML. Whether or not that’s true, only time will tell. But there’s no denying that a vast amount of content has been structured and formatted in HTML. Typically, our challenge as designers is getting content out of InDesign as HTML, a task which I covered in my article “InDesign to HTML” in the April/May 2013 issue of InDesign Magazine. But there may also be times when we’re called upon to do the opposite—to take HTML content and bring it into the realm of print or PDF through InDesign. Currently, there’s no method for directly importing HTML into InDesign, but there are a few “unofficial” paths for bringing HTML in and preserving much of its structure and formatting.
Whether your preference for tackling this task is down and dirty, methodical and geeky, or somewhere in between, at least one of the approaches in this article should help you get your HTML into InDesign—and, with one exception, won’t require spending a dime, except for the cost of your own time. None of them are perfect, but they beat starting from scratch.
Bear in mind, too, that my goal here is not to take a full web page layout and recreate it in InDesign. These methods bring in only the text content from the HTML.
As a markup language, HTML identifies different types of content, like headings, paragraphs, and lists. The formatting of that content is described by code called a Cascading Style Sheet (CSS). The content and its formatting instructions are brought together and rendered by the web browser. Your challenge is to bridge the gap between the code and InDesign using one of the following four methods.
Method #1: Copy, Paste, and Hope for the Best
• No prep up front
• Type formatting (somewhat) preserved
• No styles generated for pasted content
• Bolds and italics are lost
• This option requires the least amount of up-front effort but comes with a huge “your mileage may vary” disclaimer.
Open an HTML file or web page using your web browser, select all of the desired copy in the browser, and copy it to the clipboard. Browsers will differ in how much, if any, formatting they preserve. For example, Safari on the Mac will retain much of the formatting you’ve copied, while Chrome preserves nothing. On the Windows side, the current version of the much-maligned Internet Explorer does probably the best job of all.
Before pasting what you’ve copied into InDesign, however, be sure you’ve set your clipboard handling preference (Preferences > Clipboard Handling) for “When pasting text and tables from other applications” to “All information (Index Markers, Swatches, Styles, etc.)” (see Figure 1). Otherwise, the content will come in as unformatted text.
When you paste the copied text into InDesign, you’ll see some measure of the formatting preserved, depending on the browser you copied it from (Figure 2).
In most cases, all text will come in with “No Paragraph Style” as its style, with all other formatting treated as overrides, and no character styles will be created (Figure 3).
However, Explorer for Windows manages to hang on to heading attributes (H1, H2, etc.) and hyperlinks well enough that corresponding Paragraph Styles for H1, H2, and so on will be created, along with a Hyperlink Character Style.
From this point on, building the styles you want is up to you, but you’ll have the formatted text as a visual reference. To speed things up, you could download and install Thomas Silkjaer’s Auto Create Paragraph and Character Styles script. The script examines the formatting in the document and then creates and applies paragraph and character styles to the text. It does quite a nice job of keeping the number of styles to a minimum, too. The styles are generically named—AutoStyle1, AutoStyle2, and so on (Figure 4)—but renaming styles is a lot less work than creating them manually.