Remove parts from a string using a regular expression-CodePudding

I have a list of strings like so:

['NN.KTXS/KTXE.FOO BAR.STACK.OVERFLOW', 'NN.WFXL.Harlan KY.Harlan.KY', 'NN.WRGB/WCWN.Los Angeles CA.Burbank.CA', 'NN.KVII/KVIH.Denver.Denver.CO', 'NN.KEYE.Denver.Denver.CO']

I am trying to use a regular expression to strip out the portion of text between NN. (including that) and the second ., so the list would look like:

['FOO BAR.STACK.OVERFLOW', 'Harlan KY.Harlan.KY', 'Los Angeles CA.Burbank.CA', 'Denver.Denver.CO', 'Denver.Denver.CO']

I have tried using regex101 to build and test this, using: "NN\.[A-z]{?}\." but I am not getting any matches.

How can I build that regular expression?

CodePudding user response：

This pattern [A-z]{?} matches a range A-z (which is not the same as [A-Za-z]) then an optional { and }

To match from NN. to the next dot, you can use a negated character class [^.]* matching any character except a dot:

NN\.[^.]*\.

Replace with an empty string.

See a regex demo.

import re

lst = ['NN.KTXS/KTXE.FOO BAR.STACK.OVERFLOW', 'NN.WFXL.Harlan KY.Harlan.KY', 'NN.WRGB/WCWN.Los Angeles CA.Burbank.CA', 'NN.KVII/KVIH.Denver.Denver.CO', 'NN.KEYE.Denver.Denver.CO']
print([re.sub(r"NN\.[^.]*\.", "", s) for s in lst])

Output

['FOO BAR.STACK.OVERFLOW', 'Harlan KY.Harlan.KY', 'Los Angeles CA.Burbank.CA', 'Denver.Denver.CO', 'Denver.Denver.CO']

CodePudding user response：

You're almost there.

Start by replacing {?} with .

{? means to match { 0 or 1 times. Then } is being literally matched.

That will match NN.WFXL out of 'NN.WFXL.Harlan KY.Harlan.KY' and NN.KEYE out of 'NN.KEYE.Denver.Denver.CO'

To match the same portion of the others, expand your character set to include a /

[A-z] becomes [A-z\/]

and lastly, to be more deliberate... [A-z] matches the following characters:

ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz

Perhaps this was a deliberate choice, but if you want to match only letters, case insensitive, use [A-Za-z]

CodePudding user response：

>>> x = ['NN.KTXS/KTXE.FOO BAR.STACK.OVERFLOW', 'NN.WFXL.Harlan KY.Harlan.KY', 'NN.WRGB/WCWN.Los Angeles CA.Burbank.CA', 'NN.KVII/KVIH.Denver.Denver.CO', 'NN.KEYE.Denver.Denver.CO']
>>> y = ['.'.join(val.split('.')[2:]) for val in x]
>>> y
['FOO BAR.STACK.OVERFLOW', 'Harlan KY.Harlan.KY', 'Los Angeles CA.Burbank.CA', 'Denver.Denver.CO', 'Denver.Denver.CO']

CodePudding user response：

NN\.([\w\/] )\.

To be adjusted to your needs. Then you get the first and only group.

CodePudding user response：

All of these other answers seem too complicated for me so I would do something like this:

list = ['NN.KTXS/KTXE.FOO BAR.STACK.OVERFLOW', 'NN.WFXL.Harlan KY.Harlan.KY', 'NN.WRGB/WCWN.Los Angeles CA.Burbank.CA', 'NN.KVII/KVIH.Denver.Denver.CO', 'NN.KEYE.Denver.Denver.CO']

replacement []
for i, e in enumerate(list):
    elist = e.split(".")
    newvalue = ""
    for i2 in elist[2:len(elist)]: newvalue  = i2
    replacement.append(newvalue)
list = replacement.copy()

I hope this works for you, but Im sure you've already sued the other guys' answers.