I am trying to apply regex on python for following code.
Country_name = "usa_t1_usq_t1_[0-9]*.csv"
new_result = re.sub(r'(?:_[[0-9-] ].*[a-zA-Z]) ', '', Country_name)
# Display the Content
print(new_result)
The problem here is its working for above input, but not working for input without [0-9] pattern (3rd input in below example). for example:
input - usa_t1_usq_t1_[0-9]*.csv Expected output - usa_t1_usq_t1
input - usa_t1_usq_t1_[0-9]*.gzip.csv Expected output - usa_t1_usq_t1
input - usa_t1_usq_t1.gzip.csv Expected output - usa_t1_usq_t1
can someone help me to make proper regex for the above scenario as I am new to regex world ?
CodePudding user response:
IIUC,
inputs = ['usa_t1_usq_t1_[0-9]*.csv', 'usa_t1_usq_t1_[0-9]*.gzip.csv', 'usa_t1_usq_t1.gzip.csv']
for Country_name in inputs:
result = re.sub('(_\[0\-9\]\*)?(\.[a-zA-Z] ) ', '', Country_name)
print(result)
# usa_t1_usq_t1
# usa_t1_usq_t1
# usa_t1_usq_t1
(_\[0\-9\]\*)
matches the plain string _[0-9]*
in Country_name
, and ?
after this means it appears zero or one times.
(\.[a-zA-Z] )
matches the suffix starting with .
, and another
means it may appear more than once.