CreativePro Forum
Join our community of graphic designers, publishers, and production artists from around the world. Our members-only forum is a great place to discuss challenges and find solutions!
- You must be logged in to reply to this topic.Login
Shortest GREP Pattern to address URLs and e-mail addresses?
- This topic has 10 replies, 7 voices, and was last updated 16 years, 3 months ago by
Eelco.
-
AuthorPosts
-
-
November 20, 2009 at 12:16 am #50785
Casey
ParticipantI'm always scanning the web for useful new GREP patterns that might help things out with InDesign. I've yet to find any that appear to work as well as the ones below that I've built myself. They are simple, short, straightforward and all encompassing. I haven't really run into a URL pattern within my document copy that wasn't tagged. I'd like to issue a challenge to the forum to find a better pairs of GREP patterns. At the end you'll find some sample text you can run it against. I'm trying to think of a scenario where there would be a pattern that would be incorrectly tagged by these GREP definitions. Any ideas?
Web Address: [\w:/]+[.][\w%-/]+
Email Address: [\w-.]+@[\w-.]+In CS4: stacked in Grep Styles section where Email Address appears 2nd in the list.
In CS3: run the Web Address F/R first, followed by the Email Address F/R to clean it up.If I want to have a single character style applied to both, I use the following:
All Addresses: [\w:/]+[.@]+[\w%-/]+
Here is a scenario where various web addresses are scattered randomly throughout a paragraph. Check out https://www.adobe.com/go/learn_id_grep for more details on the topic. With the new power of GREP searches one can save time manually formatting text within the document. Email me at cdandrea@loop.ca and I'll send you, Mr. Jones, my saved queries. Check out https://en.wikipedia.org/wiki/Grep for it's history in the UNIX world.
This is some text with a bunch of addresses in it.
Some variations on email addresses:
Contact me at john.doe@adobe.com
John's email address: jdoe@adobe.comSome variations of web addresses, also known as URLs:
Visit us at adobe.com
Make sure to check out https://adobe.com
Download the installer from https://www.adobe.com
Consult https://www.adobe.com for any help
Source for tutorials on CS3: adobe.com/designcenter/video_workshop/
Everything you need can be found at https://adobe.com/designcenter/….._workshop/
https://www.adobe.com/designcen….._workshop/ (g1,2,3) is a great resource to learn
If you need help check out https://www.adobe.com/designcen….._workshop/
Grab all the old downloads from ftp://207.232.11.233
Everything you need can be found at https://adobe.com/designcenter/….._workshop/
https://www.adobe.com/designcen….._workshop/ is a great resource to learn
If you need help check out https://www.adobe.com/designcen….._workshop/
Grab all the old downloads from ftp://207.232.11.233
The installer is buried here: ftp://207.232.11.233/adobefile…../indesign/ -
November 20, 2009 at 7:30 am #50786
Theunis De Jong
MemberIf you end all with (?=\.?), it won't pick up a sentence ending period after the URL. With your expression such a period is used as part of the URLs. And I'm surprised the dash works in your expressions (or .. does it?). I think that in “[\w%-/]” it'll match anything from '%' to '/' — adding, for example, the plus sign and the comma. Well, perhaps you wanted it to.
(My personal variant is way, way longer — but it includes php's characters ?, =, & and then some more!)
-
November 20, 2009 at 8:33 am #50787
Anonymous
InactiveFor www and http, which is usually come across I just do this
w{3}.?\S+|https://.+\S+
Usually want to add a char style that has a no break included in it so the url won't be on two different lines. There are exceptions to this though.
Anyway, mine seems to work for my needs.
I've never had to search for email, funny enough.
-
November 20, 2009 at 9:40 am #50788
Anne-Marie ConcepcionMemberOh my gosh, Jongware … you don't look *anything* like how I pictured. (Something more like this) LOL
-
November 20, 2009 at 9:55 am #50789
Theunis De Jong
MemberAnne-Marie, rest assured: that blue glow comes from my screen … really …
And I let my hair loose just for the picture!
-
November 20, 2009 at 12:32 pm #50790
Casey
ParticipantJongware said:
If you end all with (?=\.?), it won't pick up a sentence ending period after the URL. With your expression such a period is used as part of the URLs. And I'm surprised the dash works in your expressions (or .. does it?). I think that in ”[\w%-/]” it'll match anything from '%' to '/' — adding, for example, the plus sign and the comma. Well, perhaps you wanted it to.
(My personal variant is way, way longer — but it includes php's characters ?, =, & and then some more!)
I've had some variations over time, but this one seems to take in account all the scenarios I've run into so far. And yes, they have been much longer as well. You bring up an interesting point regarding the dash as it would normally denote a range within a character set. I should probably move it to the beginning. And it's also true that I could add that positive lookahead as well, good idea.
-
November 21, 2009 at 3:39 pm #50791
LAURENT TOURNIER
MemberHello everybody
This is a very interesting topic. Like Casey D, I would like to find the best and shortest regex for e-mail (and web) adresses.
I tried [\w:/]+[.@]+[\w%-/]+ and it is good, but there is two problems (cf web_00.png)
I tried this : [\S/]+@?\S+\.\S+ The result is not perfect (doesn't find last / at the end of URLs) (cf. web_01.png)
But both are a problem : \w finds also accented characters as \S, and I think an e-mail adress is a string of ASCII characters.
Laurent
-
November 21, 2009 at 7:45 pm #50792
Anonymous
Inactiveyou could do \S+\.\w\w\S+
or something like that couldn't you? this will find a string of [non space] characters, then a period, then 2 word characters, then a string of [non space] characters. I can't think of a website or email that wouldn't follow this format.
the only problem is that this query would find use any regular sentence that you ended and forgot to put a space after the period but forgot to put a space before typing the next paragraph.sure it'll also include characters that can't be placed in a url or email addy, but no regular sentence should have the a format that would equal the expression i described. please correct me if i'm wrong, i'm still learning too, and would love to learn more about GREP
-
November 22, 2009 at 5:57 am #50793
Theunis De Jong
MemberThe only solution I see is making it longer — replace every \w with [\u\d_] to catch only alphanumerics and underscore, add /? after the final break to include an optional final slash, and add (?=\.?) as the very last part to find, but exclude, an optional period.
I see an additional problem: the designcenter/…/workshop URL contains an ellipsis, which you definitely not want to include (as it doesn't mean anything in URLs). You should have left this as three periods in the original text. (And it seems you might have had five consecutive periods
)Why is it so important to find a shortest possible match? An URL just has a very complex syntax. To catch them all with a single expression, well, you just have to keep adding to a working expression up to the point where you are getting more false positives than you want. It is not as if you have to keep entering it manually into InDesign — that's what Save GREP Query is for.
-
November 22, 2009 at 8:41 pm #50794
Casey
Participant@ Tournier —
I forgot all about the negative metacharacters. It's not really well documented but that covers off anything that isn't a white space character (which might be overkill, but meh).
@Jongware —
All those extra long URLs must have been truncated when I pasted the example text in there. That wasn't the intention… hehe
It's not THAT important to find the shortest pattern, just a little experiment. A couple new variations that I've come up with. Instead of word boundary I used beginning and end of word metacharacters, which automatically exclude any punctuation, periods, ellipsis (plural?), commas, etc. The one problem I run into wiith any pattern I've tried is when an acronym appears, like m.d.
————
URLs or all addresses — \<\S+\.\S+\>/?
email addresses — \<\S+@\S+\>
———–
-
November 26, 2009 at 2:15 am #50795
Eelco
Participanthank_scorpio said:
Post edited 8:33 am – November 20, 2009 by hank_scorpio
Usually want to add a char style that has a no break included in it so the url won't be on two different lines. There are exceptions to this though.
The latest IndesignSecrets newsletter comes with the tip of applying a “no language” to the url -which can be applied to a CharStyle as well. I've had many times the trouble of missing the complete text after the “no break” url. Will try this tip soon.
-
-
AuthorPosts
- The forum ‘General InDesign Topics (CLOSED)’ is closed to new topics and replies.
