split string based on pattern python-CodePudding

I am trying to delete a pattern off my string and only bring back the word I want to store.

example                                return

2022_09_21_PTE_Vendor                  PTE
2022_09_21_SSS_01_Vendor               SSS_01
2022_09_21_OOS_market                  OOS

what I tried

fileName = "2022_09_21_PTE_Vendor"
newFileName = fileName.strip(re.split('[0-9]','_Vendor.xlsx'))

CodePudding user response：

Use a regular expression replacement, not split.

newFileName = re.sub(r'^\d{4}_\d{2}_\d{2}_(. )_[^_] $', r'\1', fileName)

^\d{4}_\d{2}_\d{2}_ matches the date at the beginning. [^_] $ matches the part after the last _. And (. ) captures everything between them, which is copied to the replacement with \1.

CodePudding user response：

With Python's re module please try following Python code with its sub function written and tested in Python3 with shown samples. Documentation links for re and sub are added in hyperlinks used in their names in 1st sentence.

Here is the Online demo for used Regex.

import re
fileName = "2022_09_21_PTE_Vendor"

re.sub(r'^\d{4}(?:_\d{2}){2}_(.*?)_. $', r'\1', fileName)
'PTE'

Explanation: Adding detailed explanation for used regex.

^\d{4}   ##From starting of the value matching 4 digits here.
(?:      ##opening a non-capturing group here.
_\d{2}   ##Matching underscore followed by 2 digits
){2}     ##Closing non-capturing group and matching its 2 occurrences.
_        ##Matching only underscore here.
(.*?)    ##Creating capturing group here where using lazy match concept to get values before next mentioned character.
_. $     ##Matching _ till end of the value here.

CodePudding user response：

Assuming that the date characters at the beginning are always "YYYY_MM_DD" you could do something like this:

fileName = "2022_09_21_SSS_01_Vendor"
fileName = fileName.lstrip()[11:] // Removes the date portion
fileName = fileName.rstrip()[:fileName.rfind('_')] // Finds the last underscore and removes underscore to end
print(fileName)

CodePudding user response：

This should work:

newFileName = fileName[11:].rsplit("_")[0]