Need help with regex within re.sub . In this case I am replacing with nothing ("")
My Current Code:
file_list = ['F_5500_SF_PART7_[0-9][0-9][0-9][0-9]_all.zip',
'F_5500_SF_[0-9][0-9][0-9][0-9]_All.zip',
'F_5500_[0-9][0-9][0-9][0-9]_All.zip',
'F_SCH_A_PART1_[0-9][0-9][0-9][0-9]_All.zip']
foldernames = [re.sub('(\d{4})_All.zip', '', i) for i in file_list]
The Result I am trying to achieve is:
foldernames = ['F_5500_SF_PART7','F_5500_SF','F_5500','F_SCH_A_PART1']
I think part of the complexity is the fact that there is already regex in my file_list. Hoping someone smarter could help.
CodePudding user response:
You don't need a regular expression, you're removing fixed strings. So you can just use the str.replace()
method.
foldernames = [i.replace('_[0-9][0-9][0-9][0-9]_All.zip', '').replace('_[0-9][0-9][0-9][0-9]_all.zip', '') for i in file_list]
The two calls to replace()
are needed to handle both All
and all
. Or if the rest of the filename is always uppercase, you could use:
foldernames = [i.upper().replace('_[0-9][0-9][0-9][0-9]_ALL.ZIP', '') for i in file_list]
CodePudding user response:
Barmar's answer is the most appropriate for your problem. But if you actually need to use regex (let's say not all the files have the same fixed "[0-9][0-9][0-9][0-9]" string), then you can use:
'_(\[[-\d]*\]){4}_[aA]ll.zip'
(the [aA]ll
at the end if for capturing the lower-case "all" in your first case)