Iterate through files ending with various extensions-CodePudding

I'm using this line of code to iterate through files ending with the .tar extension, using what I believe to be a regex character '*'.

for f in glob.glob('{}/{}/Compressed_Files/*.tar'.format(path, site_id)):

How can I do this same thing but also include files ending in the csv.gz extension? Using a regex or operator maybe?

CodePudding user response：

glob doesn't support patterns that can match multiple strings like that. Just combine two globs.

g1 = glob.glob('{}/{}/Compressed_Files/*.tar'.format(path, site_id))
g2 = glob.glob('{}/{}/Compressed_Files/*.tar.gz'.format(path, site_id))
for f in g1   g2:
    # code

If there are lots of matches, it may be better to use glob.iglob(), which is an iterator. Then use itertools.chain() to combine them.

CodePudding user response：

with generator, will not cache all results, maybe there can be many files

def glob_patterns(patterns: list[str]):
    for pattern in patterns:
        for path in glob.iglob(pattern):
            yield path

for path in glob_patterns([
    '{}/{}/Compressed_Files/*.tar'.format(path, site_id),
    '{}/{}/Compressed_Files/*.tar.gz'.format(path, site_id)
]):
    print(path)