Regex: whitepsace and   in non-capturing group-CodePudding

I am using a regex within PHP to match time strings. I would like to include both white space and   in the non-capturing group to get the following matches:

Match: 10pm
Match: 10 pm

This is the regex I'm using but it is not matching items with  

(\b)(\d{1,2}:\d\d|\d{1,2})(?:\s|&nbsp;)(a\.?m\.?|p\.?m\.?)(\s|<|$|,)

CodePudding user response：

If you want to match both values, you could write and shorten the pattern to:

\b\d{1,2}(?::\d\d)?(?:\s?|&nbsp;)[ap]\.?m\b

\b A word boundary
\d{1,2} Match 1-2 digits
(?::\d\d)? Optionally match : and 1-2 digits
(?:\s?| ) Match an optional whitespace char or &nbsp
[ap]\.?m match either a or p optional dot and m
\b A word boundary or use (?:\s|<|$|,)

Regex demo

CodePudding user response：

/\b\d{1,2}(?:\s*(?: )?\s*)?(?:[ap]m\b|[ap]\.m\.)/

/\b\d{1,2}(?:\s*(?:&nbsp;)?\s*)?(?:[ap]m\b|[ap]\.m\.)/

\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
\d matches a digit (equivalent to [0-9])
- {1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
Non-capturing group (?:\s*(?: )?\s*)?
- ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
  - \s matches any whitespace character (equivalent to [\r\n\t\f\v ])
  - * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  - Non-capturing group (?: )?
    - ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
      -   matches the characters   literally
  - \s matches any whitespace character (equivalent to [\r\n\t\f\v ])
  - * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Non-capturing group (?:[ap]m\b|[ap]\.m\.)
- 1st Alternative [ap]m\b
  - Match a single character present in the list below [ap]
    - ap matches a single character in the list ap
  - m matches the character m literally
  - \b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
- 2nd Alternative [ap]\.m\.
  - Match a single character present in the list below [ap]
    - ap matches a single character in the list ap
  - \. matches the character . literally
  - m matches the character m literally
  - \. matches the character . literally
Global pattern flags
- g modifier: global. All matches (don't return after first match)

console.log(`
  Match10pm<br>
  Match:100pm<br>
  Match:10pm<br>          - match
  Match:10  pm<br>        - match
  Match:10 pmm<br>
  Match: 10p.m<br>
  Match: 10p.m.<br>       - match
  Match: 10 pm <br>       - match
  Match: 10&nbsp;pm       - match
  Match: 10&nbsp;pmm
  Match: 10&nbsp; pm<br>  - match
  Match: 10 &nbsp;pm<br>  - match
  Match: 10 &nbsp; pm<br> - match`
    // see ... [https://regex101.com/r/9186yf/2]
    .match(/\b\d{1,2}(?:\s*(?:&nbsp;)?\s*)?(?:[ap]m\b|[ap]\.m\.)/g)
);

.as-console-wrapper { min-height: 100%!important; top: 0; }