CreativePro Forum
Join our community of graphic designers, publishers, and production artists from around the world. Our members-only forum is a great place to discuss challenges and find solutions!
- You must be logged in to reply to this topic.Login
Shortest GREP Pattern to address URLs and e-mail addresses?
Tagged: URL GREP
- This topic has 12 replies, 9 voices, and was last updated 8 years, 11 months ago by
Jay Farschman.
-
AuthorPosts
-
-
November 20, 2009 at 7:16 am #53847
Casey
ParticipantI'm always scanning the web for useful new GREP patterns that might help things out with InDesign. I've yet to find any that appear to work as well as the ones below that I've built myself. They are simple, short, straightforward and all encompassing. I haven't really run into a URL pattern within my document copy that wasn't tagged. I'd like to issue a challenge to the forum to find a better pairs of GREP patterns. At the end you'll find some sample text you can run it against. I'm trying to think of a scenario where there would be a pattern that would be incorrectly tagged by these GREP definitions. Any ideas?
Web Address: [w:/]+[.][w%-/]+
Email Address: [w-.]+@[w-.]+In CS4: stacked in Grep Styles section where Email Address appears 2nd in the list.
In CS3: run the Web Address F/R first, followed by the Email Address F/R to clean it up.If I want to have a single character style applied to both, I use the following:
All Addresses: [w:/]+[.@]+[w%-/]+
Here is a scenario where various web addresses are scattered randomly throughout a paragraph. Check out https://www.adobe.com/go/learn_id_grep for more details on the topic. With the new power of GREP searches one can save time manually formatting text within the document. Email me at cdandrea@loop.ca and I'll send you, Mr. Jones, my saved queries. Check out https://en.wikipedia.org/wiki/Grep for it's history in the UNIX world.
This is some text with a bunch of addresses in it.
Some variations on email addresses:
Contact me at john.doe@adobe.com
John's email address: jdoe@adobe.comSome variations of web addresses, also known as URLs:
Visit us at adobe.com
Make sure to check out https://adobe.com
Download the installer from https://www.adobe.com
Consult https://www.adobe.com for any help
Source for tutorials on CS3: adobe.com/designcenter/video_workshop/
Everything you need can be found at https://adobe.com/designcenter/….._workshop/
https://www.adobe.com/designcen….._workshop/ (g1,2,3) is a great resource to learn
If you need help check out https://www.adobe.com/designcen….._workshop/
Grab all the old downloads from ftp://207.232.11.233
Everything you need can be found at https://adobe.com/designcenter/….._workshop/
https://www.adobe.com/designcen….._workshop/ is a great resource to learn
If you need help check out https://www.adobe.com/designcen….._workshop/
Grab all the old downloads from ftp://207.232.11.233
The installer is buried here: ftp://207.232.11.233/adobefile…../indesign/ -
November 20, 2009 at 2:30 pm #53852
Theunis De Jong
MemberIf you end all with (?=.?), it won't pick up a sentence ending period after the URL. With your expression such a period is used as part of the URLs. And I'm surprised the dash works in your expressions (or .. does it?). I think that in “[w%-/]” it'll match anything from '%' to '/' — adding, for example, the plus sign and the comma. Well, perhaps you wanted it to.
(My personal variant is way, way longer — but it includes php's characters ?, =, & and then some more!)
-
November 20, 2009 at 3:33 pm #53857
Anonymous
InactiveFor www and http, which is usually come across I just do this
w{3}.?S+|https://.+S+
Usually want to add a char style that has a no break included in it so the url won't be on two different lines. There are exceptions to this though.
Anyway, mine seems to work for my needs.
I've never had to search for email, funny enough.
-
November 20, 2009 at 4:40 pm #53860
Anne-Marie ConcepcionMemberOh my gosh, Jongware … you don't look *anything* like how I pictured. (Something more like this) LOL
-
November 20, 2009 at 4:55 pm #53862
Theunis De Jong
MemberAnne-Marie, rest assured: that blue glow comes from my screen … really …
And I let my hair loose just for the picture!
-
November 20, 2009 at 7:32 pm #53864
Casey
ParticipantJongware said:
If you end all with (?=.?), it won't pick up a sentence ending period after the URL. With your expression such a period is used as part of the URLs. And I'm surprised the dash works in your expressions (or .. does it?). I think that in ”[w%-/]” it'll match anything from '%' to '/' — adding, for example, the plus sign and the comma. Well, perhaps you wanted it to.
(My personal variant is way, way longer — but it includes php's characters ?, =, & and then some more!)
I've had some variations over time, but this one seems to take in account all the scenarios I've run into so far. And yes, they have been much longer as well. You bring up an interesting point regarding the dash as it would normally denote a range within a character set. I should probably move it to the beginning. And it's also true that I could add that positive lookahead as well, good idea.
-
November 21, 2009 at 10:39 pm #53885
LAURENT TOURNIER
MemberHello everybody
This is a very interesting topic. Like Casey D, I would like to find the best and shortest regex for e-mail (and web) adresses.
I tried [w:/]+[.@]+[w%-/]+ and it is good, but there is two problems (cf web_00.png)
I tried this : b[S/]+@?S+.S+b The result is not perfect (doesn't find last / at the end of URLs) (cf. web_01.png)
But both are a problem : w finds also accented characters as S, and I think an e-mail adress is a string of ASCII characters.
Laurent
-
November 22, 2009 at 2:45 am #53887
Anonymous
Inactiveyou could do S+.wwS+
or something like that couldn't you? this will find a string of [non space] characters, then a period, then 2 word characters, then a string of [non space] characters. I can't think of a website or email that wouldn't follow this format.
the only problem is that this query would find use any regular sentence that you ended and forgot to put a space after the period but forgot to put a space before typing the next paragraph.sure it'll also include characters that can't be placed in a url or email addy, but no regular sentence should have the a format that would equal the expression i described. please correct me if i'm wrong, i'm still learning too, and would love to learn more about GREP
-
November 22, 2009 at 12:57 pm #53889
Theunis De Jong
MemberThe only solution I see is making it longer — replace every w with [uld_] to catch only alphanumerics and underscore, add /? after the final break to include an optional final slash, and add (?=.?) as the very last part to find, but exclude, an optional period.
I see an additional problem: the designcenter/…/workshop URL contains an ellipsis, which you definitely not want to include (as it doesn't mean anything in URLs). You should have left this as three periods in the original text. (And it seems you might have had five consecutive periods
)Why is it so important to find a shortest possible match? An URL just has a very complex syntax. To catch them all with a single expression, well, you just have to keep adding to a working expression up to the point where you are getting more false positives than you want. It is not as if you have to keep entering it manually into InDesign — that's what Save GREP Query is for.
-
November 23, 2009 at 3:41 am #53907
Casey
Participant@ Tournier —
I forgot all about the negative metacharacters. It's not really well documented but that covers off anything that isn't a white space character (which might be overkill, but meh).
@Jongware —
All those extra long URLs must have been truncated when I pasted the example text in there. That wasn't the intention… hehe
It's not THAT important to find the shortest pattern, just a little experiment. A couple new variations that I've come up with. Instead of word boundary I used beginning and end of word metacharacters, which automatically exclude any punctuation, periods, ellipsis (plural?), commas, etc. The one problem I run into wiith any pattern I've tried is when an acronym appears, like m.d.
————
URLs or all addresses — <S+.S+>/?
email addresses — <S+@S+>
———–
-
November 26, 2009 at 9:15 am #54006
Eelco
Participanthank_scorpio said:
Post edited 8:33 am – November 20, 2009 by hank_scorpio
Usually want to add a char style that has a no break included in it so the url won't be on two different lines. There are exceptions to this though.
The latest IndesignSecrets newsletter comes with the tip of applying a “no language” to the url -which can be applied to a CharStyle as well. I've had many times the trouble of missing the complete text after the “no break” url. Will try this tip soon.
-
August 31, 2013 at 4:42 am #65128
Mark Coster
MemberHi everyone
I’m fairly new to GREP, but I started experimenting with it on the magazine that I produce and sure enough I ran into ‘The URL problem’ that everyone inevitably does! I initially came up with a fairly long code that targeted URLs that did not have ‘https://’ or ‘www.’ at the start because that was the magazine’s house style:
(\w+|\w+-|\w+/)*(\w+\.[\u]+(\.[\u]+)*)(/\w+(/w+|-\w+)*)*
But then I looked at the problem again and tried to simplify each element until I got the following:
([^]+)\.([^]+)
This (as far as I can see with the testing I have done so far) finds any url irrespective of what’s in it. I tested it on Casey’s sample text in his post and it makes the same matches (unless I missed something). I’d love any feedback or suggestions of how to improve it. I also posted this on my own blog (https://www.pixooma.co.uk/blog/2013/08/31/url-grep-simplified/) as I’d like to come back to this topic regularly.
Mark
-
November 23, 2016 at 9:15 am #90033
Jay Farschman
MemberMark,
I tries your super-short grep and tried it with my newspaper this time. One problem is that it underlines my times (shown below) as well as the URLs… close, but too short.
([^]+)\.([^]+)
Saturday, Dec. 10 and Sunday, Dec. 11, 10:00a.m.-5:00p.m.: Celebrate the holiday season with friendly event! Info: jackalopeartfair.com.
-
-
AuthorPosts
- The forum ‘General InDesign Topics (CLOSED)’ is closed to new topics and replies.
