I would like to extract only certain files from a list. I need to apply following rules while extracting only selected files from a list.
if the file contains patterns like f[1-99] or t[1-99] or v[1-99] or combination of f[1-9]_v[1-9]_t[1-9]. below are some sample.
phone_football_androind_1_v1_te_t1_fe
phone_football_ios_v1_t1
foot_cricket2345678_f12_t4
tfd_fr_ve_t1_v1_f3_201234_yyymmmdd
def_000_t4_f1
file_job_1234567_f1_t55
ROKLOP_f33_t44
agdcv_t45
gop_gop_f1_t14_v14
file_op_v1_t1
fop_f1_v1_1223
could u lease help how to check if the above patterns contains in the files and take only file with following patterns? I have tried following but stuck with reges in python. not sure how to add OR condition in regex
import re
# Take input from users
MyString1 = "tfd_fr_ve_t1_v1_f3_201234_yyymmmdd"
# re.search() returns a Match object
# if there is a match anywhere in the string
if re.search('(_v(\d )).*', MyString1):
print("YES,it is present in string ")
else:
print("NO,string is not present")
CodePudding user response:
To check if a match is present:
_[ftv][1-9]\d?(?!\d)
Explanation
_
Match literally[ftv]
Match one off
t
v
[1-9]\d?
Match a digit 1- 99(?!\d)
Assert not a digit to the right
Example code
import re
strings = [
"tfd_fr_ve_t1_v1_f3_201234_yyymmmdd",
"phone_football_androind_1_v1_te_t1_fe",
"phone_football_ios_v1_t1",
"foot_cricket2345678_f12_t4",
"tfd_fr_ve_t1_v1_f3_201234_yyymmmdd",
"def_000_t4_f1",
"file_job_1234567_f1_t55",
"ROKLOP_f33_t44",
"agdcv_t45",
"gop_gop_f1_t14_v14",
"file_op_v1_t1",
"fop_f1_v1_1223",
"test"
]
pattern = r"_[ftv][1-9]\d?(?!\d)"
for s in strings:
if re.search(pattern, s):
print(f"YES, present in '{s}' ")
else:
print(f"NO, not present in '{s}'")
Output
YES, present in 'tfd_fr_ve_t1_v1_f3_201234_yyymmmdd'
YES, present in 'phone_football_androind_1_v1_te_t1_fe'
YES, present in 'phone_football_ios_v1_t1'
YES, present in 'foot_cricket2345678_f12_t4'
YES, present in 'tfd_fr_ve_t1_v1_f3_201234_yyymmmdd'
YES, present in 'def_000_t4_f1'
YES, present in 'file_job_1234567_f1_t55'
YES, present in 'ROKLOP_f33_t44'
YES, present in 'agdcv_t45'
YES, present in 'gop_gop_f1_t14_v14'
YES, present in 'file_op_v1_t1'
YES, present in 'fop_f1_v1_1223'
NO, not present in 'test'
CodePudding user response:
I think this little regex can match all your results:
'(f|t|v)[1-9]{1,2}'
Here's a little code snippet showing the matching results:
>>> files = ['phone_football_androind_1_v1_te_t1_fe', 'phone_football_ios_v1_t1', 'foot_cricket2345678_f12_t4', 'tfd_fr_ve_t1_v1_f3_201234_yyymmmdd', 'def_000_t4_f1', 'file_job_1234567_f1_t55', 'ROKLOP_f33_t44', 'agdcv_t45', 'gop_gop_f1_t14_v14', 'file_op_v1_t1', 'fop_f1_v1_1223']
>>> regex = '(f|t|v)[1-9]{1,2}'
>>> for file in files:
... if re.search(regex, file):
... print(f"match: {file}")
...
match: phone_football_androind_1_v1_te_t1_fe
match: phone_football_ios_v1_t1
match: foot_cricket2345678_f12_t4
match: tfd_fr_ve_t1_v1_f3_201234_yyymmmdd
match: def_000_t4_f1
match: file_job_1234567_f1_t55
match: ROKLOP_f33_t44
match: agdcv_t45
match: gop_gop_f1_t14_v14
match: file_op_v1_t1
match: fop_f1_v1_1223