I am new to regex and I want to know how to generate a pattern with letters including special characters and Capital letters from 3 letters up.
Suppose I have a string like this:
my_string = 'Syrians/NORP, Turkish/NORP, Turkish/NORP, Turkish/NORP, the last 2 , 3 years/DATE, Turkey/LOC'
What I have tried:
my_new_string = re.findall('[\w \,] /[A-Z]{4}', my_string)
#result
['Syrians/NORP', 'Turkish/NORP', 'Turkish/NORP', 'Turkish/NORP', 'years/DATE']
Expected result:
['Syrians/NORP', 'Turkish/NORP', 'Turkish/NORP', 'Turkish/NORP', 'the last 2 , 3 years/DATE', 'Turkey/LOC']
I also struggled with the pattern of capital letters from 3 or up.
Can you propose a good solution? Thanks in advance!
CodePudding user response:
>>> re.findall(r'\w[\w, ] /[A-Z]{3,4}', my_string)
['Syrians/NORP', 'Turkish/NORP', 'Turkish/NORP', 'Turkish/NORP', 'the last 2 , 3 years/DATE', 'Turkey/LOC']
just add space to your character class (where the ' ' is not needed after \w
), and range from 3 to 4 to match "LOC" (or whatever range you need). Start with an alphanum to avoid matching leading spaces (which also matches _
btw but not a problem here)