I am trying to match all the words in the sentence:
"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.
I tried:
([A-Za-z\d(^\n$)] ('[A-Za-z] )?)
but I don't want to match \nSo
as a word. Only So
. As a matter of fact, I want to exclude all forms of white space like \n
or \t
.
My Julia
code is:
sentence = """"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled."""
regex = r"([A-Za-z\d(^\n$)] ('[A-Za-z] )?)"
v =[m.match for m = eachmatch(regex, sentence)]
CodePudding user response:
It turned out the \r
, \n
and \t
are two-letter combinations in your texts.
Since Julia uses PCRE you can use a SKIP-FAIL regex here to easily ingore these combinations from matches:
\\[rnt](*SKIP)(*F)|\w (?:['-]\w )*
See the regex demo. Details:
\\[rnt](*SKIP)(*F)
- a\
char and then eitherr
,n
ort
, and then the matched chars are dropped, the match is failed and the engine starts looking for the next match from the failure position|
- or\w (?:['-]\w )*
- one or more word chars and then zero or more repetitions of'
or-
and then one or more chars.
In Julia:
julia> sentence = """"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled."""
"\"That's the password: 'PASSWORD 123'!\", cried the Special Agent.\nSo I fled."
julia> regex = r"\\[rnt](*SKIP)(*F)|\w (?:['-]\w )*"
r"\\[rnt](*SKIP)(*F)|\w (?:['-]\w )*"
julia> v =[m.match for m = eachmatch(regex, sentence)]
12-element Vector{SubString{String}}:
"That's"
"the"
"password"
"PASSWORD"
"123"
"cried"
"the"
"Special"
"Agent"
"So"
"I"
"fled"
See the online Julia demo.