If I have a list of Windows file paths (strings), how would I search for all list objects that have a consecutive 10-digit number in the file path --to add to a list?
Is there a way to define a range of wildcard characters and search or apply a filter?
example:
from this list:
('C:\Users\ Documents\1H_1P_42497372610000\Kirkbride A1P_42497586550009\Well History.tif',
'C:\Users\ Documents\TEMPORARY\WISE\30497372610000\Kirkbride _42478972610009\ Drilling\Proposals.pdf',
'C:\Users\ Documents\Well History\Drilling\Proposals\Cement\Pilot hole KO plug\ Test Results.txt')
this would be my new list (or dataframe):
('C:\Users\ Documents\1H_1P_42497372610000\Kirkbride A1P_42497586550009\Well History.tif',
'C:\Users\ Documents\TEMPORARY\WISE\30497372610000\Kirkbride _42478972610009\ Drilling\Proposals.pdf')
I attempted a few tries with the glob() function and tried to piece together a filter with conditions where I defined a variable 'x' = ('1', '2', '3' . . .)
and filtered items where 'x' 'x' 'x' 'x' 'x' 'x' 'x' 'x' 'x' 'x'
didn't occur. I just couldn't come close to piecing together anything that made sense, or that wasn't searching for integers (which won't work).
Help me! Please and thank you!
CodePudding user response:
You can use regex to find strings with 10 consecutive numbers:
In [63]: [i for i in strings if len(re.findall('\d{10}',re.escape(i)))>0]
Out[63]:
['C:\\Users\\ Documents\\1H_1P_42497372610000\\Kirkbride A1P_42497586550009\\Well History.tif',
'C:\\Users\\ Documents\\TEMPORARY\\WISE\\30497372610000\\Kirkbride _42478972610009\\ Drilling\\Proposals.pdf']
You might not need the re.escape
call, I had to on linux because of the escape characters, which explains the double backslashes '\\'.