I want to use pathlib.glob()
to find directories with a specific name pattern (*data
) in the current working dir. I don't want to explicitly check via .isdir()
or something else.
Input data
This is the relevant listing with three folders as the expected result and one file with the same pattern but that should be part of the result.
ls -ld *data
drwxr-xr-x 2 user user 4,0K 9. Sep 10:22 2021-02-11_68923_data/
drwxr-xr-x 2 user user 4,0K 9. Sep 10:22 2021-04-03_38923_data/
drwxr-xr-x 2 user user 4,0K 9. Sep 10:22 2022-01-03_38923_data/
-rw-r--r-- 1 user user 0 9. Sep 10:24 2011-12-43_3423_data
Expected result
[
'2021-02-11_68923_data/',
'2021-04-03_38923_data/',
'2022-01-03_38923_data/'
]
Minimal working example
from pathlib import Path
cwd = Path.cwd()
result = cwd.glob('*_data/')
result = list(result)
That gives me the 3 folders but also the file.
Also tried the variant cwd.glob('**/*_data/')
.
CodePudding user response:
glob
is insufficient here. From the filesystem's perspective, the directory's name really is "2021-02-11_68923_data", not "2021-02-11_68923_data/". Since glob only looks at names, it cannot differentiate between "regular" files and directories, and you'd have to add some additional check, such as isdir
that you mentioned.
CodePudding user response:
The trailing path separator certainly should be respected in pathlib glob patterns. This is the expected behaviour in shells on all platforms, and is also how the glob module works:
If the pattern is followed by an os.sep or os.altsep then files will not match.
So, as a work-around, you can use the glob module to get the behaviour you want:
>>> import glob
>>> glob.glob('*')
['html', 'images', 'test.py']
>>> glob.glob('*/')
['html/', 'images/']
The issue with pathlib was fixed in bpo-22276, and merged in Python-3.11.0rc1 (see what's new: pathlib). So if you want to stick with pathlib, please test it out and report any issues.