Home > Blockchain >  Remove Characters From A String Until A Specific Format is Reached
Remove Characters From A String Until A Specific Format is Reached

Time:10-16

So I have the following strings and I have been trying to figure out how to manipulate them in such a way that I get a specific format.

string1-itd_jan2021-internal
string2itd_mar2021-space
string3itd_feb2021-internal
string4-itd_mar2021-moon
string5itd_jun2021-internal
string6-itd_feb2021-apollo

I want to be able to get rid of any of the last string so I am just left with the month and year, like below:

string1-itd_jan2021
string2itd_mar2021
string3itd_feb2021
string4-itd_mar2021
string5itd_jun2021
string6-itd_feb2021

I thought about using string.split on the - but then realized that for some strings this wouldn't work. I also thought about getting rid of a set amount of characters by putting it into a list and slicing but the end is varying characters length?

Is there anything I can do it with regex or any other python module?

CodePudding user response:

Use str.rsplit with the appropriate maxsplit parameter:

s = s.rsplit("-", 1)[0]

You could also use str.split (even though this is clearly the worse choice):

s = "-".join(s.split("-")[:-1])

Or using regular expressions:

s = re.sub(r'-[^-]*$', '', s)
# "-[^-]*" a "-" followed by any number of non-"-"

CodePudding user response:

With a regex:

import re
re.sub(r'([0-9]{4}).*$', r'\1', s)

CodePudding user response:

You can use rpartition as another approach like below:

>>> content = ['string1-itd_jan2021-internal' , 'string2itd_mar2021-space' , 'string3itd_feb2021-internal' , 'string4-itd_mar2021-moon' , 'string5itd_jun2021-internal' ,'string6-itd_feb2021-apollo' ]

>>> [c.rpartition('-')[0] for c in content]
['string1-itd_jan2021',
 'string2itd_mar2021',
 'string3itd_feb2021',
 'string4-itd_mar2021',
 'string5itd_jun2021',
 'string6-itd_feb2021']

CodePudding user response:

Use re.sub like so:

import re
lines = '''string1-itd_jan2021-internal
string2itd_mar2021-space
string3itd_feb2021-internal
string4-itd_mar2021-moon
string5itd_jun2021-internal
string6-itd_feb2021-apollo'''

for old in lines.split('\n'):
    new = re.sub(r'[-][^-] $', '', old)
    print('\t'.join([old, new]))

Prints:

string1-itd_jan2021-internal    string1-itd_jan2021
string2itd_mar2021-space        string2itd_mar2021
string3itd_feb2021-internal     string3itd_feb2021
string4-itd_mar2021-moon        string4-itd_mar2021
string5itd_jun2021-internal     string5itd_jun2021
string6-itd_feb2021-apollo      string6-itd_feb2021

Explanation:
r'[-][^-] $' : Literal dash (-), followed by any character other than a dash ([^-]) repeated 1 or more times, followed by the end of the string ($).

  • Related