Regex Name Retrieval-CodePudding

I'm attempting to write a simple Regex expression that retrieves names for me based on the presence of a character string at the end of a line.

I've been successful at isolating each of these patterns using pythex in my data set, but I have been unable to match them as a conditional group.

Can someone explain what I am doing wrong?

Data Example

Mark Samson: CA

Sam Smith: US

Dawn Watterton: CA

Neil Shughar: CA

Fennial Fontaine: US

I want to be able to create a regex expression that uses the end of each line as the condition of the group match - i.e I want a list of those who live in the US from this dataset. I have used each of these expressions in isolation and it seems to work in matching what I am looking for. What I need is help in making the below a grouped search.

Does anyone have any suggestion?

([US]$)([A-Z][a-z] )

CodePudding user response：

Something like the following?

(\w [ \w]*): US

CodePudding user response：

You say "I have been unable to match them as a conditional group", but you are not using any conditional groups. ([US]$)([A-Z][a-z] ) is an example of a pattern that never matches any string as it matches U or S, then requires an end of string, and then matches an uppercase ASCII letter and one or more ASCII lowercase letters.

You want any string from start till a colon, whitespaces, and US substring at the end of string.

Hence, use

. ?(?=:\s*US$)
^(. ?):\s*US$

See the regex demo. Details:

. ? - one or more chars other than line break chars as few as possible
(?=:\s*US$) - a positive lookahead that matches a location immediately followed with :, zero or more whitespaces, US string and the end of string.

See a Python demo:

import re
texts = ["Mark Samson: CA", "Sam Smith: US", "Dawn Watterton: CA", "Neil Shughar: CA", "Fennial Fontaine: US"]
for text in texts:
    match = re.search(r". ?(?=:\s*US$)", text)
    if match:
        print(match.group()) # With r"^(. ?):\s*US$" regex, use match.group(1) here

Output:

Sam Smith
Fennial Fontaine