Back

If your email is not recognized and you believe it should be, please contact us.

  • You must be logged in to reply to this topic.Login

GREP to find first two words of a sentence

Return to Member Forum

  • Author
    Posts
    • #83967

      I thought I had this. What I want to do is avoid situations where the first word of a new sentence sits by itself on the previous line.

      Example:

      “…this is the end of the first sentence. This”

      What I want is this instead:

      “…this is the end of the first sentence.
      This is the beginning of the next sentence.”

      So I tried this with a “No Break”:

      (\. \w+) (\w+)

      Which makes this happen:

      “…this is the end of the first
      sentence. This is beginning of the next sentence”

      HELP!

    • #83969
      Peter Kahrel
      Participant

      You want to keep together the first two words in a sentence. So you should exclude the period from the capture:

      (?<=\. )\w+? \w

      The so-called lookbehind (?<=\. ) reads “if preceded by a period and a space”. This means that the period and the space are included in the search, so they’re matched but not captured. Therefore the ‘no break’ will be applied only to the two words after the period+space.

      Note that the way you matched the first two words in a sentence — \w+ \w+ — doesn’t allow either of them to be hyphenated. The formulation above — \w+? \w — allows the second word to be hyphenated because only its first character is applied no-break. The first word is still applied no-break, so it won’t hyphenate. If you want to allow both words at the beginning of the sentence to kept together and to be hyphenated, use this GREP:

      Find what: \. \w+\K
      Change to: ~S

      and blank the Find format and Change format panes. The above query paraphrases as “replace the space between the first two words in a sentence with a non-fixed-width space”. The \K designates a so-called variable-width lookbehind.

      Peter

    • #83970

      Hi Peter. Well, I tried putting your first code into a paragraph style GREP with a no break, but what still seems to be happening to me is this:

      “…this is the end of the first sentence. This”

      It’s that darn first word (“This”) sitting up there by itself that bugs me. I don’t even know what to call such instances… orphan? widow? I don’t think this scenario even has a name. But I digress.

      Yep, I’d be perfectly happy to simply no break the first two words of any sentence, and if I can’t allow for hyphenation, that really doesn’t bother me all that much.

      I just want to avoid having the first word of a sentence appearing by itself at the end of the line prior.

    • #83971
      Ari Singer
      Member

      I think the culprit is having two spaces instead of one after the period (which some old timers still do). To fix it fix run the ready GREP Find/Change query “Multiple Space to Single Space”.

      Or if you don’t want to delete the spaces you can enter this string in the GREP style: \. +\K\w+? which basically tells it to look even if there are more than one space.

      • #83975

        Hah! No, it’s not that. While my document is a kluge from different sources, one thing I’m diligent about is making sure all double spaces after periods get changed to single spaces. I used to be an old-timer, but now I’m a spry single space kind of guy.

        As an aside, I sort of have Peter’s fix working now (I rebooted InDesign), but single first words with an apostrophe are still being stubborn and trying to stick around at the end of the last line.

        Maybe I’ll just have to manually fix those.

      • #83983
        Ari Singer
        Member

        The reason it doesn’t pick up words with an apostrophe is because the ‘any word character’ wildcard (\w) excludes an apostrophe. So the obvious solution is to add an apostrophe in the string before the \w followed by a ? which translates to: Find an apostrophe, which may or may not be there, followed by any word character one or more times. So this is the final string:

        (?<=\. )'?\w+? \w

    • #83973
      Peter Kahrel
      Participant

      What you want to avoid has no name as far as I’m aware because nobody objects to it. Some publishers want to avoid single-letter words (A, a, I) at the end of a line, but it still has no name.

      That the expressio doesn’t work for you is probably caused by the double spaces that Ari mentioned, and his suggested solutions should handle that.

    • #83976

      Thanks Peter. Your fixes got me further than I had been able to myself. But I can’t help but wonder, how can publishers ignore this kind of abomination? It looks weird and it needs a name IMHO!

      • #83977

        If you’re referring to book publishers, they definitely don’t consider it an abomination. What they consider an abomination is a stack of periods. Or a stack of capitalized letters starting a line. Or loose lines caused by what you consider an abomination. Other abominations are three or more line breaks with hyphens. Some don’t like a word hyphenating across pages (and they then complain if the line is loosely spaced). And widows (last line of a paragraph) at the top of a page. Orphans (one word on last paragraph) are allowed so long as they complete words of at least four characters (not including punctuation). But some will mark widows as our mistake even though the it’s an entire word that is 10 characters long. One never knows what to expect.

        Peter mentioned a single-letter words, but sometimes we allow them to avoid stacks or a bunch of a’s, A’s, or I’s at the beginning of a line. The publishers don’t like it. They’d rather prefer that a few can end a line in order to break things up.

        I do a lot of books and I’m surprised at what some authors, professional proofreaders and editors find abominations or unacceptable. Some find stacks of two periods or columns an abomination, some find “rivers” where there aren’t any. Heck–we had an author insist we hyphenate a word incorrectly because he thought it looked better. And the publisher allowed it!

        But I’m talking book publishing and I’m not sure what field you are in so far as InDesign.

        I do know that I cringe when I read certain magazines (such as Sports Illustrated and Playboy), because of the hyphenation (they allow two characters down), stacks, super-tight lines or loose lines. And especially widows (last line of a paragraph at the top of the page). Now those things to me are an abomination.

Viewing 5 reply threads
  • The forum ‘General InDesign Topics (CLOSED)’ is closed to new topics and replies.
Forum Ads