Home > database >  python regex to extract only specific pattern file names from a list
python regex to extract only specific pattern file names from a list

Time:12-22

I would like to extract only certain files from a list. I need to apply following rules while extracting only selected files from a list.

if the file contains patterns like f[1-99] or t[1-99] or v[1-99] or combination of f[1-9]_v[1-9]_t[1-9]. below are some sample.

phone_football_androind_1_v1_te_t1_fe
phone_football_ios_v1_t1
foot_cricket2345678_f12_t4
tfd_fr_ve_t1_v1_f3_201234_yyymmmdd
def_000_t4_f1
file_job_1234567_f1_t55
ROKLOP_f33_t44
agdcv_t45
gop_gop_f1_t14_v14
file_op_v1_t1
fop_f1_v1_1223

could u lease help how to check if the above patterns contains in the files and take only file with following patterns? I have tried following but stuck with reges in python. not sure how to add OR condition in regex

import re

# Take input from users
MyString1 = "tfd_fr_ve_t1_v1_f3_201234_yyymmmdd"

# re.search() returns a Match object
# if there is a match anywhere in the string
if re.search('(_v(\d )).*', MyString1):
    print("YES,it is present in string ")
else:
    print("NO,string is not present")

CodePudding user response:

To check if a match is present:

_[ftv][1-9]\d?(?!\d)

Explanation

  • _ Match literally
  • [ftv] Match one of f t v
  • [1-9]\d? Match a digit 1- 99
  • (?!\d) Assert not a digit to the right

Regex demo | Python demo

Example code

import re

strings = [
    "tfd_fr_ve_t1_v1_f3_201234_yyymmmdd",
    "phone_football_androind_1_v1_te_t1_fe",
    "phone_football_ios_v1_t1",
    "foot_cricket2345678_f12_t4",
    "tfd_fr_ve_t1_v1_f3_201234_yyymmmdd",
    "def_000_t4_f1",
    "file_job_1234567_f1_t55",
    "ROKLOP_f33_t44",
    "agdcv_t45",
    "gop_gop_f1_t14_v14",
    "file_op_v1_t1",
    "fop_f1_v1_1223",
    "test"
]
pattern = r"_[ftv][1-9]\d?(?!\d)"
for s in strings:
    if re.search(pattern, s):
        print(f"YES, present in '{s}' ")
    else:
        print(f"NO, not present in '{s}'")

Output

YES, present in 'tfd_fr_ve_t1_v1_f3_201234_yyymmmdd' 
YES, present in 'phone_football_androind_1_v1_te_t1_fe' 
YES, present in 'phone_football_ios_v1_t1' 
YES, present in 'foot_cricket2345678_f12_t4' 
YES, present in 'tfd_fr_ve_t1_v1_f3_201234_yyymmmdd' 
YES, present in 'def_000_t4_f1' 
YES, present in 'file_job_1234567_f1_t55' 
YES, present in 'ROKLOP_f33_t44' 
YES, present in 'agdcv_t45' 
YES, present in 'gop_gop_f1_t14_v14' 
YES, present in 'file_op_v1_t1' 
YES, present in 'fop_f1_v1_1223' 
NO, not present in 'test'

CodePudding user response:

I think this little regex can match all your results:

'(f|t|v)[1-9]{1,2}'

Here's a little code snippet showing the matching results:

>>> files = ['phone_football_androind_1_v1_te_t1_fe', 'phone_football_ios_v1_t1', 'foot_cricket2345678_f12_t4', 'tfd_fr_ve_t1_v1_f3_201234_yyymmmdd', 'def_000_t4_f1', 'file_job_1234567_f1_t55', 'ROKLOP_f33_t44', 'agdcv_t45', 'gop_gop_f1_t14_v14', 'file_op_v1_t1', 'fop_f1_v1_1223']
>>> regex = '(f|t|v)[1-9]{1,2}'
>>> for file in files:
...     if re.search(regex, file):
...         print(f"match: {file}")
...
match: phone_football_androind_1_v1_te_t1_fe
match: phone_football_ios_v1_t1
match: foot_cricket2345678_f12_t4
match: tfd_fr_ve_t1_v1_f3_201234_yyymmmdd
match: def_000_t4_f1
match: file_job_1234567_f1_t55
match: ROKLOP_f33_t44
match: agdcv_t45
match: gop_gop_f1_t14_v14
match: file_op_v1_t1
match: fop_f1_v1_1223
  • Related