I would like to find out all words, numbers that start or end or contain '.
I tried by writing 2 regex as below. In case of the second one I added ?:
to say that text at the end of the word or at the beginning of the word is optional. But not getting required results. What did you I do wrong? I would like to find I've, 'had, not', you're, 123'45
- basically everything that has '
import re
xyz="I've never 'had somebody [redacted-number] [redacted-number] [redacted-number] not. not' you're 123'45"
print (re.findall("\w \'\w ", xyz))
print (re.findall("(?:\w )\'(?:\w )", xyz))
["I've", "you're", "123'45"]
["I've", "you're", "123'45"]
CodePudding user response:
You're almost there. Try this:
(?:\w )?'(?:\w )?
(?:\w )
=> ?:
ensures Non capturing group, \w
matches word character between 1 and unlimited times. ?
ensures to match the previous token between 0 and 1 time.
https://regex101.com/r/N8Y9cQ/1
CodePudding user response:
You want to capture all words that contain a '
anywhere within them, no? Try this:
re.findall("\w*'\w*", xyz)
CodePudding user response:
You can use
\w*(?!\B'\B)'\w*
\w '\w*|'\w
See the regex demo #1 / regex demo #2.
Details
\w*(?!\B'\B)'\w*
- zero or more word chars, a'
char (that is not preceded and followed with non-word chars or start/end of string), zero or more word chars\w '\w*|'\w
- one or more word chars,'
, zero or more word chars, OR a'
char and then one or more word chars.
See the Python demo:
import re
xyz="I've never 'had somebody [redacted-number] [redacted-number] [redacted-number] not. not' you're 123'45"
print (re.findall(r"\w*(?!\B'\B)'\w*", xyz))
# => ["I've", "'had", "not'", "you're", "123'45"]
In Pandas, you can use Series.str.findall
:
df['result'] = df['source'].str.findall(r"\w*(?!\B'\B)'\w*")