I have a list of strings like so:
['NN.KTXS/KTXE.FOO BAR.STACK.OVERFLOW', 'NN.WFXL.Harlan KY.Harlan.KY', 'NN.WRGB/WCWN.Los Angeles CA.Burbank.CA', 'NN.KVII/KVIH.Denver.Denver.CO', 'NN.KEYE.Denver.Denver.CO']
I am trying to use a regular expression to strip out the portion of text between NN.
(including that) and the second .
, so the list would look like:
['FOO BAR.STACK.OVERFLOW', 'Harlan KY.Harlan.KY', 'Los Angeles CA.Burbank.CA', 'Denver.Denver.CO', 'Denver.Denver.CO']
I have tried using regex101 to build and test this, using: "NN\.[A-z]{?}\."
but I am not getting any matches.
How can I build that regular expression?
CodePudding user response:
This pattern [A-z]{?}
matches a range A-z (which is not the same as [A-Za-z]
) then an optional {
and }
To match from NN.
to the next dot, you can use a negated character class [^.]*
matching any character except a dot:
NN\.[^.]*\.
Replace with an empty string.
See a regex demo.
import re
lst = ['NN.KTXS/KTXE.FOO BAR.STACK.OVERFLOW', 'NN.WFXL.Harlan KY.Harlan.KY', 'NN.WRGB/WCWN.Los Angeles CA.Burbank.CA', 'NN.KVII/KVIH.Denver.Denver.CO', 'NN.KEYE.Denver.Denver.CO']
print([re.sub(r"NN\.[^.]*\.", "", s) for s in lst])
Output
['FOO BAR.STACK.OVERFLOW', 'Harlan KY.Harlan.KY', 'Los Angeles CA.Burbank.CA', 'Denver.Denver.CO', 'Denver.Denver.CO']
CodePudding user response:
You're almost there.
Start by replacing {?}
with
.
{?
means to match {
0 or 1 times. Then }
is being literally matched.
That will match NN.WFXL
out of 'NN.WFXL.Harlan KY.Harlan.KY'
and NN.KEYE
out of 'NN.KEYE.Denver.Denver.CO'
To match the same portion of the others, expand your character set to include a /
[A-z]
becomes [A-z\/]
and lastly, to be more deliberate... [A-z]
matches the following characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz
Perhaps this was a deliberate choice, but if you want to match only letters, case insensitive, use [A-Za-z]
CodePudding user response:
>>> x = ['NN.KTXS/KTXE.FOO BAR.STACK.OVERFLOW', 'NN.WFXL.Harlan KY.Harlan.KY', 'NN.WRGB/WCWN.Los Angeles CA.Burbank.CA', 'NN.KVII/KVIH.Denver.Denver.CO', 'NN.KEYE.Denver.Denver.CO']
>>> y = ['.'.join(val.split('.')[2:]) for val in x]
>>> y
['FOO BAR.STACK.OVERFLOW', 'Harlan KY.Harlan.KY', 'Los Angeles CA.Burbank.CA', 'Denver.Denver.CO', 'Denver.Denver.CO']
CodePudding user response:
NN\.([\w\/] )\.
To be adjusted to your needs. Then you get the first and only group.
CodePudding user response:
All of these other answers seem too complicated for me so I would do something like this:
list = ['NN.KTXS/KTXE.FOO BAR.STACK.OVERFLOW', 'NN.WFXL.Harlan KY.Harlan.KY', 'NN.WRGB/WCWN.Los Angeles CA.Burbank.CA', 'NN.KVII/KVIH.Denver.Denver.CO', 'NN.KEYE.Denver.Denver.CO']
replacement []
for i, e in enumerate(list):
elist = e.split(".")
newvalue = ""
for i2 in elist[2:len(elist)]: newvalue = i2
replacement.append(newvalue)
list = replacement.copy()
I hope this works for you, but Im sure you've already sued the other guys' answers.