CreativePro Forum

Join our community of graphic designers, publishers, and production artists from around the world. Our members-only forum is a great place to discuss challenges and find solutions!

You must be logged in to reply to this topic.Login

Can you all help with a GREP de-bug

Umm_fish · 2012-01-20T14:48:13-08:00

I'm trying to write GREP code to find all the initial page numbers in an index. Index snippet: Abernathy, Ralph: 5, 13–4, 44, 55, 60, 66, 116Abu Ghraib prison, 46Afghanistan War 1919–1922, 169, 171Afro-American, The (newspaper), 29Anderson, Marian, 100, 197n11 All that I need are the first page of the reference, so I would exclude anything after the en dash. I would also exclude the note reference after p. 197 in the Anderson entry. What I am having trouble with is excluding the date range in the Afghanistan War entry (I made up the date range, BTW). Here's the GREP code I'm using: (?<=([, |: |,~}| ~}]))d+(?=[,|–|-|n| r|r]) I'm not asking it to select a number with just a space in front, but I'm getting that anyway (probably contained in the d shortcut, yeah?). How can I exclude number that are only proceeded by spaces? Everything else seems to work fine. Thanks! Andy

This topic has 7 replies, 3 voices, and was last updated 14 years, 6 months ago by Umm_fish.

Return to General InDesign Topics (CLOSED)

Author

Posts
- January 20, 2012 at 2:48 pm #61497
  
  Umm_fish
  Member
  
  I'm trying to write GREP code to find all the initial page numbers in an index. Index snippet:
  
  Abernathy, Ralph: 5, 13–4, 44, 55, 60, 66, 116
  Abu Ghraib prison, 46
  Afghanistan War 1919–1922, 169, 171
  Afro-American, The (newspaper), 29
  Anderson, Marian, 100, 197n11
  
  All that I need are the first page of the reference, so I would exclude anything after the en dash. I would also exclude the note reference after p. 197 in the Anderson entry. What I am having trouble with is excluding the date range in the Afghanistan War entry (I made up the date range, BTW). Here's the GREP code I'm using:
  
  (?<=([, |: |,~}| ~}]))d+(?=[,|–|-|n| r|r])
  
  I'm not asking it to select a number with just a space in front, but I'm getting that anyway (probably contained in the d shortcut, yeah?). How can I exclude number that are only proceeded by spaces? Everything else seems to work fine.
  
  Thanks!
  
  Andy
- January 21, 2012 at 2:33 pm #61501
  
  Anonymous
  Inactive
  
  edit: for some reason the forward slash isn't showing up where i type it so i've bolded the characters that need a forward slash preceding them
  
  assuming you're not using any 1000+ page books you could just search for
  
  (?<+[,:]s)d{1,3}(?=D)
  
  that would find any 3 digit number or less that is preceded by a ,s or :s
  
  (?<+[,:]s)d+
  
  or this also finds 4 digit numbers but not if they're not preceded by ,s or :s
  
  I'm not at my work computer today so i can't test this right now. but I'll check back on monday to see if your problem is solved.
- January 21, 2012 at 2:38 pm #61502
  
  Anonymous
  Inactive
  
  I'm also now noticing that your grep doesn't have any of the forward slashes in it. I'm wondering if there may be a bug with the site right now?
- January 21, 2012 at 4:32 pm #61503
  
  Theunis De Jong
  Member
  
  It seems all it needs is this:
  
  (?<=, |: )d+
  
  — it finds every string of digits that are preceded by either comma-space or colon-space. There is no need to check for notes (digits, but preceded by an 'n') or en-dashes (the same, it has an en-dash). It also skips the war date range, for the same reason.
  
  (Is there a reason the first item has a colon and all others have not?)
  
  … forward slashes …
  
  You mean backslashes :) This editor probably treats a single backslash as some kind of special code and discards it. To insert a single backslash like this enter two of them when posting: \ — you have to watch out when editing a post, because at that point it will contain single backslashes again and you have to change them back to \ one at a time.
- January 21, 2012 at 5:49 pm #61504
  
  Umm_fish
  Member
  
  Thanks, all. This gets close. There's one more possibility that I didn't include in the sample:
  
  “Agenda,” 180–1
  
  So, I altered the grep above just a bit:
  
  (?<=,” |, |: )d+
  
  Unfortunately, that works on the Agenda sample, but it suddenly leaves out the first digit of any other number. Now I'm really confused. Not that I wasn't before. BTW, I'm testing these using the GREP styles section of the paragraph style, if that makes any difference.
  
  (Is there a reason the first item has a colon and all others have not?)
  
  Yes. I'm trying to account for all of the types of indexes I might see from this press, not just this one project.
  
  Thanks!
- January 21, 2012 at 5:56 pm #61505
  
  Umm_fish
  Member
  
  Okay, I figured it out. Altering it to this works:
  
  (?<=[,” |, |: ])d+
  
  The brackets did the trick. Thanks for the help!
- January 22, 2012 at 2:30 pm #61507
  
  Theunis De Jong
  Member
  
  Umm, that is not a good fix.
  
  What it does, in essence, is finding any of the individual characters inside the set. The entire part between the square brackets means “any single one of the characters inside” (the OR | does not work inside a [character set] as such, it merely adds the | character to the allowed ones). You can see it indeed doesn't work if you check your example
  
  Afghanistan War 1919–1922, 169, 171
  
  — all it does is it finds all digit strings that are preceded by a space — or by a comma, quote, colon, or pipe, as you can see if you insert a pipe anywhere inside a string of digits.
  
  By the way, the reason your first try failed:
  
  (?<=,” |, |: )d+
  
  is because Lookbehind does not work with strings of different length, they all have to be of the same length. For the same reason you cannot use repeating codes (+, *, and ?) inside a Lookbehind — I guess it would make poor old InDesign work too hard, going over all possible combinations.
  
  But you can work around this by making them two Lookbehinds:
  
  ((?<=,” )|(?<=, |: ))d+
  
  The first one checks for “comma-quote-space”, the second one for “comma-space” or “colon-space”. They are grouped inside parentheses with an OR bar, so if one doesn't work, the other takes over.
- January 23, 2012 at 7:18 am #61516
  
  Umm_fish
  Member
  
  Thank you very much. This is the first time I've ever really worked with GREP. I was a Quark guy for a bazillion years and did most of my work on files inside of Word, then exported xtg files from there. This is all a whole new world for me. I must say, I'm sure there's a logic behind GREP, but I sure don't have my head around it yet.
  
  Thanks again!
Author

Posts