Back

If your email is not recognized and you believe it should be, please contact us.

  • You must be logged in to reply to this topic.Login

Can you all help with a GREP de-bug

Return to Member Forum

  • Author
    Posts
    • #61497
      Umm_fish
      Member

      I'm trying to write GREP code to find all the initial page numbers in an index. Index snippet:

      Abernathy, Ralph: 5, 13–4, 44, 55, 60, 66, 116
      Abu Ghraib prison, 46
      Afghanistan War 1919–1922, 169, 171
      Afro-American, The (newspaper), 29
      Anderson, Marian, 100, 197n11

      All that I need are the first page of the reference, so I would exclude anything after the en dash. I would also exclude the note reference after p. 197 in the Anderson entry. What I am having trouble with is excluding the date range in the Afghanistan War entry (I made up the date range, BTW). Here's the GREP code I'm using:

      (?<=([, |: |,~}| ~}]))d+(?=[,|–|-|n| r|r])

      I'm not asking it to select a number with just a space in front, but I'm getting that anyway (probably contained in the d shortcut, yeah?). How can I exclude number that are only proceeded by spaces? Everything else seems to work fine.

      Thanks!

      Andy

    • #61501
      Anonymous
      Inactive

      edit: for some reason the forward slash isn't showing up where i type it so i've bolded the characters that need a forward slash preceding them

      assuming you're not using any 1000+ page books you could just search for

      (?<+[,:]s)d{1,3}(?=D)

      that would find any 3 digit number or less that is preceded by a ,s or :s

      (?<+[,:]s)d+

      or this also finds 4 digit numbers but not if they're not preceded by ,s or :s

      I'm not at my work computer today so i can't test this right now. but I'll check back on monday to see if your problem is solved.

    • #61502
      Anonymous
      Inactive

      I'm also now noticing that your grep doesn't have any of the forward slashes in it. I'm wondering if there may be a bug with the site right now?

    • #61503

      It seems all it needs is this:

      (?<=, |: )d+

      — it finds every string of digits that are preceded by either comma-space or colon-space. There is no need to check for notes (digits, but preceded by an 'n') or en-dashes (the same, it has an en-dash). It also skips the war date range, for the same reason.

      (Is there a reason the first item has a colon and all others have not?)

      … forward slashes …

      You mean backslashes :) This editor probably treats a single backslash as some kind of special code and discards it. To insert a single backslash like this enter two of them when posting: \ — you have to watch out when editing a post, because at that point it will contain single backslashes again and you have to change them back to \ one at a time.

    • #61504
      Umm_fish
      Member

      Thanks, all. This gets close. There's one more possibility that I didn't include in the sample:

      “Agenda,” 180–1

      So, I altered the grep above just a bit:

      (?<=,” |, |: )d+

      Unfortunately, that works on the Agenda sample, but it suddenly leaves out the first digit of any other number. Now I'm really confused. Not that I wasn't before. BTW, I'm testing these using the GREP styles section of the paragraph style, if that makes any difference.

      (Is there a reason the first item has a colon and all others have not?)

      Yes. I'm trying to account for all of the types of indexes I might see from this press, not just this one project.

      Thanks!

    • #61505
      Umm_fish
      Member

      Okay, I figured it out. Altering it to this works:

      (?<=[,” |, |: ])d+

      The brackets did the trick. Thanks for the help!

    • #61507

      Umm, that is not a good fix.

      What it does, in essence, is finding any of the individual characters inside the set. The entire part between the square brackets means “any single one of the characters inside” (the OR | does not work inside a [character set] as such, it merely adds the | character to the allowed ones). You can see it indeed doesn't work if you check your example

      Afghanistan War 1919–1922, 169, 171

      — all it does is it finds all digit strings that are preceded by a space — or by a comma, quote, colon, or pipe, as you can see if you insert a pipe anywhere inside a string of digits.

      By the way, the reason your first try failed:

      (?<=,” |, |: )d+

      is because Lookbehind does not work with strings of different length, they all have to be of the same length. For the same reason you cannot use repeating codes (+, *, and ?) inside a Lookbehind — I guess it would make poor old InDesign work too hard, going over all possible combinations.

      But you can work around this by making them two Lookbehinds:

      ((?<=,” )|(?<=, |: ))d+

      The first one checks for “comma-quote-space”, the second one for “comma-space” or “colon-space”. They are grouped inside parentheses with an OR bar, so if one doesn't work, the other takes over.

    • #61516
      Umm_fish
      Member

      Thank you very much. This is the first time I've ever really worked with GREP. I was a Quark guy for a bazillion years and did most of my work on files inside of Word, then exported xtg files from there. This is all a whole new world for me. I must say, I'm sure there's a logic behind GREP, but I sure don't have my head around it yet.

      Thanks again!

Viewing 7 reply threads
  • The forum ‘General InDesign Topics (CLOSED)’ is closed to new topics and replies.
Forum Ads