Back

If your email is not recognized and you believe it should be, please contact us.

  • You must be logged in to reply to this topic.Login

XML as a deliverable from InDesign

Return to Member Forum

  • Author
    Posts
    • #58868
      skilldrick
      Member

      Hi guys

      A big publisher I work for now requires XML as a deliverable, along with the print-ready PDFs we have always produced.

      The first book I'm doing this for has gone to print now, and I've started tagging up the InDesign file with XML tags. It's a real struggle though. I've done a first pass using “Map styles to tags” but as you can guess that doesn't get you very far.

      Part of the problem is things like numbered lists. The automatic tagging gives

      <list-item>Item</list-item>

      <list-item>Item</list-item>

      <list-item>Item</list-item>

      but I need

      <list>

      <list-item><p>Item</p></list-item>

      <list-item><p>Item</p></list-item>

      <list-item><p>Item</p></list-item>

      </list>

      So I select all the tags in the structure pane, add <list> as a parent, then with each list-item give list-item as a parent, then with each inner list-item retag as <p>. And I have to do this manually.

      I saw David's thoughts on this topic elsewhere on this forum and I'm wondering what the general consensus here is. Is there any way to produce this XML without a hell of a lot of fiddly manual work? I know that there's companies in India that will produce a valid XML file from a PDF – is this the sanest route?

      Thanks

      Nick

    • #58878

      I've been working for a while now on XML deliverables for a big publisher we work for (quite possibly the same one as you). Short answer: no, there is no way to produce this XML without a hell of a lot of fiddly work.

      We tried working with XML tags in InDesign, but very quickly found that they were far too limited for what we needed. A much better place to start is with an IDML file, which is very well-formed XML, but even then there's a heck of a lot of work involved in making sure the files are pristine and impeccably styled.

      Not being programmers, we use a consultant for the conversion to XML compliant with the publisher's DTD. I can put you in touch with him if you like?

    • #58879
      skilldrick
      Member

      IDML – that's an interesting idea. I've never looked at it, but from what I've read it does sound like it could be a sensible starting point. I had been hoping that we could have a master file that we could just output PDF *and* XML from, so any changes could be reflected in both without having to do lots of work again or maintaining two separate versions of the same book. It's looking like that's not going to happen though.

      I'll pursue the IDML route on my own (I *am* a programmer, but I'm also a book project manager, so I have to balance my time!), but could you put me in touch with your consultant please? My email address is skilldrick at gmail dot com.

      Cheers!

    • #58881

      Theoretically, if you keep your InDesign file clean enough, and have a decent enough conversion from IDML to your final XML, then you can export to IDML at any stage and make only a couple of minor tweaks to it before converting it to the publisher's XML. But yes, you do tend to end up with a bit of a versioning mess, as inevitably the strict requirements mean that you spot errors while working on the IDML file that had been overlooked up until now.

      I'll email you separately with our consultant's details.

    • #59006
      peppobon
      Member

      Hi Nick,

      here is what I have learned and what I do now for extracting (and eventually roundtripping) arbitrary xml from/to InDesign.

      1. Lesson learned

      The xml tools provided natively by ID are definitely not adeguate for book-like (i.e. narrative) content. If you want an usable xml structured editing and pdf production environment use Structured FrameMaker (but in the general publishing world this is not usually an option). The ID xml tools are useful only if you are in catalogue/database publishing.

      2. My workflow

      The solution is definitely starting from an IDML export. Here is what I currently do.

      2.1 Edit the ID source

      I edit the ID source to apply styles consistently, group figures and tables with their captions and anchor the groups to their relevant positions in the flow; I usually also add xrefs and index entries at this stage. if you do this wisely, you can use this styling also for print production, otherwise you will have to work on a copy of your typesetted file. At the end (the mileage may vary…) I export from Id to IDML.

      2.2 Generate a linear xml representation of the ID content

      I have developed a library written in Python to abstract the IDML base semantic. I have classes for all the major components of the IDML spec (styles, stories, figures, table, etc.), so I can easily work programmatically on the extracted files. At this stage I essentially have a linear representation of the contents along with the styles used, like in the following example:

      <contentRoot>

      <content style=”chapterTitle” id=”0″/>

      <content style=”titleSect1″ id=”1″/>

      <content style=”normal” id=”2″/>

      <content style=”orderedListItem” id=”3″/>

      <content style=”normal” id=”4″/>

      <content style=”titleSect1″ id=”5″/>

      <content style=”normal” id=”6″/>

      <content style=”figure” id=”7″/>

      <content style=”titleSect2″ id=”8″/>

      <content style=”normal” id=”9″/>

      <content style=”titleSect2″ id=”10″/>

      <content style=”normal” id=”11″/>

      <content style=”titoloSect1″ id=”12″/>

      <content style=”normal” id=”13″/>

      <content style=”table” id=”14″/>

      </contentRoot>

      <styles>

      <paraStyle name=”chapterTitle”/>

      <paraStyle name=”titleSect1″/>

      <paraStyle name=”titleSect2″/>

      <paraStyle name=”normal”/>

      <paraStyle name=”orderedListItem”/>

      <paraStyle name=”figure”/>

      <paraStyle name=”table”/>

      <charStyle name=”charStyle1″/>

      <charStyle name=”charStyle2″/>

      </styles>

      2.3 Configure the mappings

      Given the target DTD/schema you can configure the mapping between tags and styles writing a configuration file like this example (you can easily generate automatically templates to be hand edited):

      <styles>

      <paraStyle name=”chapterTitle” targetXpath=”chapter/title”/>

      <paraStyle name=”titleSect1″ targetXpath=”chapter/sect1/title”/>

      <paraStyle name=”normal” targetXpath=”chapter/sect1/para”/>

      <paraStyle name=”orderedListItem” targetXpath=”chapter/sect1/orderedlist/listitem”/>

      <paraStyle name=”figure” targetXpath=”chapter/sect1/figure”/>

      <paraStyle name=”table” targetXpath=”chapter/sect1/table”/>

      <paraStyle name=”titleSect2″ targetXpath=”hapter/sect2/title”/>

      <paraStyle name=”normal” targetXpath=”chapter/sect1/sect2/para”/>

      <paraStyle name=”orderedListItem” targetXpath=”chapter/sect1/sect2/orderedlist/listitem”/>

      <paraStyle name=”figure” targetXpath=”chapter/sect1/sect2/figure”/>

      <paraStyle name=”table” targetXpath=”chapter/sect1/sect2/table”/>

      </styles>

      2.4 Add programmatically nesting and inline styling

      Now you have all you need to nest automatically the original ID content as per your target DTD/schema. You add inlines as per charStyles definition.

      I routinely use this approach to transform from ID to docbook/DITA/custom DTD and it works as a charm.

      You can go in the other direction too (but I not have tested this yet): from the target xml you can generate an IDML/ICML file to be imported/placed in an ID template (ICML is an IDML subset that essentially gives you an InCopy file your editors can direcly place into an ID file).

      Note than the key phase is as usual 2.1, but you can train editorial/typesetting staff to do that as there is non need of xml expertise. Usually 2.3 can be done by editorial staff from the publisher as well. 2.4 works flawlessy if the original content has a logical flow, nevertheless there are indeed some boundary cases, like a chapterTitle/titleSect2/titleSect1 sequence (which, by the way, is logically flawed).

      If you can enforce some across the board consistency in the use of ID styles during the early typesetting phase you can configure an almost automagic conversion workflow.

      3. Conclusions

      I do think this is a doable ID xml last workflow!

      Contact me privately (peppobon at gmail dot com) if you need more complete examples.

      Cheers,

      __peppo

    • #59007
      peppobon
      Member

      Sorry,

      the mappings config under 2.1 is wrong (cut&paste mistake).

      It should be:

      <styles>

      <paraStyle name=”chapterTitle” targetXpath=”chapter/title”>

      <paraStyle name=”titleSect1″ targetXpath=”chapter/sect1/title”>

      <paraStyle name=”normal” targetXpath=”chapter/sect1/para”/>

      <paraStyle name=”orderedListItem” targetXpath=”chapter/sect1/orderedlist/listitem”/>

      <paraStyle name=”figure” targetXpath=”chapter/sect1/figure”/>

      <paraStyle name=”table” targetXpath=”chapter/sect1/table”/>

      <paraStyle name=”titleSect2? targetXpath=”hapter/sect2/title”>

      <paraStyle name=”normal” targetXpath=”chapter/sect1/sect2/para”/>

      <paraStyle name=”orderedListItem” targetXpath=”chapter/sect1/sect2/orderedlist/listitem”/>

      <paraStyle name=”figure” targetXpath=”chapter/sect1/sect2/figure”/>

      <paraStyle name=”table” targetXpath=”chapter/sect1/sect2/table”/>

      </paraStyle>

      </paraStyle>

      </paraStyle>

      </styles>

      __peppo

    • #59014
      skilldrick
      Member

      Hi peppo

      Thanks a lot for your detailed post, that's really interesting and useful.

      The InDesign XML features are a red herring as far as I'm concerned – not much point them being there unless they put in some real effort in later ID versions.

      Nick

    • #59047

      Forgive my ignorance, but what's “a deliverable”?

    • #59048
      skilldrick
      Member

      @Jeremy – I mean something that the client needs to be given at the end of the project. Before, a print-ready PDF was the only deliverable – now they need us to provide XML as well.

    • #59049

      IDML is a sort of XML, and it contains everything that was in the original InDesign document. Conceivably, that might be all they are looking for. But if not, I would have thought XSLT could re-shape/filter the XML to meet any reasonable requirements. The receiver of the XML file might use an XML editor to apply the XSLT, or even use InDesign to apply the XSLT during importation process; or you might apply it during the exportation process.

    • #59050
      skilldrick
      Member

      Thanks Jeremy. They have a very strict DTD, and it'll take a lot of XSLT to make it valid (I've done a bit of XSLT – not sure I want to go back!). It also needs work doing that XSLT can't easily do, so I don't think that's a viable option unfortunately. Maybe combined with other techniques it would be useful though.

Viewing 10 reply threads
  • The forum ‘General InDesign Topics (CLOSED)’ is closed to new topics and replies.
Forum Ads