So I have the following strings and I have been trying to figure out how to manipulate them in such a way that I get a specific format.
string1-itd_jan2021-internal
string2itd_mar2021-space
string3itd_feb2021-internal
string4-itd_mar2021-moon
string5itd_jun2021-internal
string6-itd_feb2021-apollo
I want to be able to get rid of any of the last string so I am just left with the month and year, like below:
string1-itd_jan2021
string2itd_mar2021
string3itd_feb2021
string4-itd_mar2021
string5itd_jun2021
string6-itd_feb2021
I thought about using string.split on the - but then realized that for some strings this wouldn't work. I also thought about getting rid of a set amount of characters by putting it into a list and slicing but the end is varying characters length?
Is there anything I can do it with regex or any other python module?
CodePudding user response:
Use str.rsplit
with the appropriate maxsplit
parameter:
s = s.rsplit("-", 1)[0]
You could also use str.split
(even though this is clearly the worse choice):
s = "-".join(s.split("-")[:-1])
Or using regular expressions:
s = re.sub(r'-[^-]*$', '', s)
# "-[^-]*" a "-" followed by any number of non-"-"
CodePudding user response:
With a regex:
import re
re.sub(r'([0-9]{4}).*$', r'\1', s)
CodePudding user response:
You can use rpartition
as another approach like below:
>>> content = ['string1-itd_jan2021-internal' , 'string2itd_mar2021-space' , 'string3itd_feb2021-internal' , 'string4-itd_mar2021-moon' , 'string5itd_jun2021-internal' ,'string6-itd_feb2021-apollo' ]
>>> [c.rpartition('-')[0] for c in content]
['string1-itd_jan2021',
'string2itd_mar2021',
'string3itd_feb2021',
'string4-itd_mar2021',
'string5itd_jun2021',
'string6-itd_feb2021']
CodePudding user response:
Use re.sub
like so:
import re
lines = '''string1-itd_jan2021-internal
string2itd_mar2021-space
string3itd_feb2021-internal
string4-itd_mar2021-moon
string5itd_jun2021-internal
string6-itd_feb2021-apollo'''
for old in lines.split('\n'):
new = re.sub(r'[-][^-] $', '', old)
print('\t'.join([old, new]))
Prints:
string1-itd_jan2021-internal string1-itd_jan2021
string2itd_mar2021-space string2itd_mar2021
string3itd_feb2021-internal string3itd_feb2021
string4-itd_mar2021-moon string4-itd_mar2021
string5itd_jun2021-internal string5itd_jun2021
string6-itd_feb2021-apollo string6-itd_feb2021
Explanation:
r'[-][^-] $'
: Literal dash (-
), followed by any character other than a dash ([^-]
) repeated 1 or more times, followed by the end of the string ($
).