Back

If your email is not recognized and you believe it should be, please contact us.

  • You must be logged in to reply to this topic.Login

Having GREP problems

Return to Member Forum

  • Author
    Posts
    • #95133
      Anonymous
      Inactive

      Hi,

      I’m working on proofreading an academic book that has garbled Greek imported into it, possibly because of the variety of fonts, word processors, and upgrades over which the author wrote it. Problem is it’s 720,000 words of content so it’s impossible to find them all manually.

      There are numerous occurrences of Greek and potentially Greek extended (two separate Unicode strings) occurring immediately before or after Latin/English Unicode. Basically, it creates a word that is in two different languages at once. I think GREP is the easiest way to find and mark these for the editor who actually *knows* Greek to find it.

      I know the following locates Greek Unicode in GREP: ([\x{1F00}-\x{1FFF}]+)|([\x{0370}-\x{03FF}]+)

      However, I can’t figure out if there’s a good way to find when that string occurs *immediately before* or *immediately after* any Latin Unicode string, of which there are at least 7 separate ones. Anyone know a good syntax that would find those while excluding spaces or paragraph breaks? This is for an academic print series, so we really can’t get it wrong once ink goes to paper.

      Thanks!

    • #95137

      Hi Joel,

      please provide some real examples as idml-file.

      Thanks
      Kai

      • #95151
        Anonymous
        Inactive

        Thanks Kai, I would normally, but the book is for a publisher and isn’t released yet, so it’s a proprietary file I can’t let out. I’ll try to find a sample I can copy and paste in if that helps, or get permission to drop a page of it in as IDML. I’m getting a page reference for examples from the general editor. As I said, it’s over 1,000 pages long, so it’s not a quick process to find examples. He knows where they are better than I do.

    • #95153

      If you want, you can sent me the file directly to forum@ruebiarts.de. I can guarantee that no one else will see the data.

    • #95154
      Anonymous
      Inactive

      Thanks Kai. I need to get approval from my manager to send a small sample and also need to get the IDML files from our typesetter. I’ll let you know for sure shortly.

    • #95262
      Peter Kahrel
      Participant

      Joel,

      Basic Latin, Latin-1 supplement, Latin Extended-A, Latin-Extended-B, and Latin Extended Additional are captured by this expression:

      [\x{0000}-\x{00FF}\x{0100}-\x{024F}\x{1E00}–\x{1EFF}]

      To find Greek immediately following some Latin character, place Latin in a lookbehind:

      And/or to find Greek immediately followed by a Latin character, place the Latin in a leekahead:

      [\x{1F00}-\x{1FFF}\x{0370}-\x{03FF}]+(?=[\x{0000}-\x{00FF}\x{0100}-\x{024F}\x{1E00}–\x{1EFF}])

      You’ll need two passes if you want Greek followed or preceded by Latin. And you may have to add some punctuation to the Latin ranges. Note that I changed you Greek matcher a bit to make it more efficient (you’ll need efficiency when you’re looking in 720,000 words!

      Peter

    • #95270
      Masood Ahmad
      Participant

      Peter, you’re right. This will definitely find the Greek Character followed by a Latin Character including a space and punctuation marks as you said. This means that it will also find all the instances of space between two Greek words?????

      @Joel, you can’t do Replace All with this code, you have to check one by one…

    • #95309
      Peter Kahrel
      Participant

      Something went wrong earlier, a whole line of code disappeared (and what is a leekahead, one wonders). This is what it should have been:

      To find Greek immediately following some Latin character, place Latin in a lookbehind:

      [\x{0000}-\x{00FF}\x{0100}-\x{024F}\x{1E00}–\x{1EFF}]\K[\x{1F00}-\x{1FFF}\x{0370}-\x{03FF}]+

      And/or to find Greek immediately followed by a Latin character, place the Latin in a lookahead:

      [\x{1F00}-\x{1FFF}\x{0370}-\x{03FF}]+(?=[\x{0000}-\x{00FF}\x{0100}-\x{024F}\x{1E00}–\x{1EFF}])

      > This means that it will also find all the instances of space between two Greek words?

      It does and it doesn’t. It would now match spaces between Greek words, but it doesn’t match Greek words separated by spaces (or punctuation, for that matter). So those grep expressions need some tweaking.

    • #95312

      Joel, It sounds as if Kai and Peter have you covered but, if you still need help in sorting out the file, feel free to send me a sample. I read Greek, ancient and modern. It sounds to me as if you have a word rendered in monotonic and polytonic Greek and as a transliteration in latin script. How all those got into your file is an interesting question.

    • #95319
      Anonymous
      Inactive

      Lindsey, we investigated this and it gets weirder. Some of the Greek glyphs show up perfectly fine in Word but the diacritics and accents get turned into the x-ed out boxes (as in glyph doesn’t exist) when they get converted to INDD/PDF. It appears to be something with the author’s use of multiple fonts and word processors over the years. Looks like I’m going to need to fix it manually in the proofread. The digital conversion company that tagged the book located about 95% of them. It’s up to me to proofread the rest–1,200 pages of it. By hand. Joy. I’ll send you a PDF sample page (it’s the closest I can get since our typesetter’s on vacation at the moment).

      Looks like I can’t figure out how to privately message you. Do you know how to do that?

Viewing 8 reply threads
  • The forum ‘General InDesign Topics (CLOSED)’ is closed to new topics and replies.
Forum Ads