I have file names like
ios_g1_v1_yyyymmdd
ios_g1_v1_h1_yyyymmddhhmmss
ios_g1_v1_h1_YYYYMMDDHHMMSS
ios_g1_v1_g1_YYYY
ios_g1_v1_j1_YYYYmmdd
ios_g1_v1
ios_g1_v1_t1_h1
ios_g1_v1_ty1_f1
I would like to remove only the suffix when it matches the string YYYYMMDDHHMMSS OR yyyymmdd OR YYYYmmdd OR YYYY
my expected output would be
ios_g1_v1
ios_g1_v1_h1
ios_g1_v1_h1
ios_g1_v1_g1
ios_g1_v1_j1
ios_g1_v1
ios_g1_v1_t1_h1
ios_g1_v1_ty1_f1
How can I achieve this in python using regex ? i tried with something like below, but it didn't work
word_trimmed_stage1 = re.sub('.*[^YYYYMMDDHHMMSS]$', '', filename)
CodePudding user response:
You can be explicit and use the exact patterns that you have identified, optionally case insensitive with re.I
:
files = ['ios_g1_v1_yyyymmdd',
'ios_g1_v1_h1_yyyymmddhhmmss',
'ios_g1_v1_h1_YYYYMMDDHHMMSS',
'ios_g1_v1_g1_YYYY',
'ios_g1_v1_j1_YYYYmmdd',
'ios_g1_v1',
'ios_g1_v1_t1_h1',
'ios_g1_v1_ty1_f1']
files2 = [re.sub('_(?:YYYYMMDDHHMMSS|yyyymmdd|YYYYmmdd|YYYY)$', '', x, flags=re.I)
for x in files]
NB. with re.I
you only need one of yyyymmdd
/YYYYmmdd
.
Compressed variant:
files2 = [re.sub('_YYYY(?:MMDD(?:HHMMSS)?)?$', '', x, flags=re.I) for x in files]
Output:
['ios_g1_v1',
'ios_g1_v1_h1',
'ios_g1_v1_h1',
'ios_g1_v1_g1',
'ios_g1_v1_j1',
'ios_g1_v1',
'ios_g1_v1_t1_h1',
'ios_g1_v1_ty1_f1']
CodePudding user response:
To remove a string ending with "YYYYMMDDHHMMSS" or one of the other specified formats, you can use the rstrip method. This method will remove all characters in the specified string that appear at the end of the target string.
Here's an example of how you can use it: s = "abcdefgYYYYMMDDHHMMSS" suffix = "YYYYMMDDHHMMSS"
You can also use to remove the other specified formats by replacing "YYYYMMDDHHMMSS" with the appropriate format string.
CodePudding user response:
Disclaimer: this is a non regex approach; @mozway posted a good regex approach
files = ['ios_g1_v1_yyyymmdd',
'ios_g1_v1_h1_yyyymmddhhmmss',
'ios_g1_v1_h1_YYYYMMDDHHMMSS',
'ios_g1_v1_g1_YYYY',
'ios_g1_v1_j1_YYYYmmdd',
'ios_g1_v1',
'ios_g1_v1_t1_h1',
'ios_g1_v1_ty1_f1']
lst=[]
for filenames in files:
k=[]
for x in range(len(filenames)-1):
if filenames[x]=='y' or filenames[x]=='Y':
if filenames[x 1]=='y' or filenames[x 1]=='Y':
break
else:
k.append(filenames[x])
if k[-1]=='_':
lst.append(''.join(k)[:-1])
else:
lst.append(''.join(k))
print(lst)
#['ios_g1_v1', 'ios_g1_v1_h1', 'ios_g1_v1_h1', 'ios_g1_v1_g1', 'ios_g1_v1_j1', 'ios_g1_v', 'ios_g1_v1_t1_h', 'ios_g1_v1_t1_f']
CodePudding user response:
IIUC, your pattern involves Year, Month, Day, Hour, Minute, Second
characters with any number of repeated characters in that order, starting with an underscore and case-insensitive.
Try this pattern r"_Y M*D*H*M*S*"
-
import re
regex_pattern = r"_Y M*D*H*M*S*"
result = [re.sub(regex_pattern,'',i, flags=re.IGNORECASE) for i in l]
result
['ios_g1_v1',
'ios_g1_v1_h1',
'ios_g1_v1_h1',
'ios_g1_v1_g1',
'ios_g1_v1_j1',
'ios_g1_v1',
'ios_g1_v1_t1_h1',
'ios_g1_v1_ty1_f1']
EXPLANATION
- The
_
matches the underscore at start of the patter - The
flags=re.IGNORECASE
makes this pattern search case-insensitive - The
Y
matches at least 1 instance ofY
- Then the
M*D*H*M*S*
match any instances of these specific characters after the initialY
in that order (starting 0 instances)
CodePudding user response:
This can be another approach
out = []
for filename in filenames:
if filename.split("_")[-1].lower().startswith("y"):
out.append("_".join(filename.split("_")[:-1]))
else:
out.append(filename)
print(out)
You can also make good use of list()
function instead of append
one element at a time:
out = list(
"_".join(filename.split("_")[:-1])
if filename.split("_")[-1].lower().startswith("y")
else filename
for filename in filenames
)
Both approach should produce the same output: Output:
['ios_g1_v1',
'ios_g1_v1_h1',
'ios_g1_v1_h1',
'ios_g1_v1_g1',
'ios_g1_v1_j1',
'ios_g1_v1',
'ios_g1_v1_t1_h1',
'ios_g1_v1_ty1_f1']
CodePudding user response:
Try removing everything after the last _
detected.