Home > database >  Python re.sub with regex
Python re.sub with regex

Time:07-13

Need help with regex within re.sub . In this case I am replacing with nothing ("")

My Current Code:

file_list = ['F_5500_SF_PART7_[0-9][0-9][0-9][0-9]_all.zip',
 'F_5500_SF_[0-9][0-9][0-9][0-9]_All.zip',
 'F_5500_[0-9][0-9][0-9][0-9]_All.zip',
 'F_SCH_A_PART1_[0-9][0-9][0-9][0-9]_All.zip']

foldernames = [re.sub('(\d{4})_All.zip', '', i) for i in file_list]

The Result I am trying to achieve is:

foldernames = ['F_5500_SF_PART7','F_5500_SF','F_5500','F_SCH_A_PART1']

I think part of the complexity is the fact that there is already regex in my file_list. Hoping someone smarter could help.

CodePudding user response:

You don't need a regular expression, you're removing fixed strings. So you can just use the str.replace() method.

foldernames = [i.replace('_[0-9][0-9][0-9][0-9]_All.zip', '').replace('_[0-9][0-9][0-9][0-9]_all.zip', '') for i in file_list]

The two calls to replace() are needed to handle both All and all. Or if the rest of the filename is always uppercase, you could use:

foldernames = [i.upper().replace('_[0-9][0-9][0-9][0-9]_ALL.ZIP', '') for i in file_list]

CodePudding user response:

Barmar's answer is the most appropriate for your problem. But if you actually need to use regex (let's say not all the files have the same fixed "[0-9][0-9][0-9][0-9]" string), then you can use:

'_(\[[-\d]*\]){4}_[aA]ll.zip'

(the [aA]ll at the end if for capturing the lower-case "all" in your first case)

  • Related