Home > OS >  Using regular expressions to read a file from a directory
Using regular expressions to read a file from a directory

Time:10-14

I have a directory consisting of many files. In each iteration of my for loop, I want to read a file starting with

"stc_"   str(k)   "anything here"   "_alpha.mat"

This k changes in each iteration. How can I use regular expressions to read files like this?

There is only one file with "stc_" str(k) in the beginning. But "anything here" changes from file to file.

I know one option is to rewrite all files but I want to learn how to use regular expressions for this purpose.

CodePudding user response:

You can do it with filter on os.listdir:

import os
import re

def glob_re(pattern, strings):
    return filter(re.compile(pattern).match, strings)

filenames = glob_re(r'stc_\d.*_alpha\.mat', os.listdir())

CodePudding user response:

You have not revealed the domain of k, but based on comments, it seems to be a number.

If there is only one file for each k, you can simply loop over those.

for knum in range(kmin, kmax 1):
     for file in glob.glob("stc_%i*_alpha.mat" % knum):
        # Only expect one match
        process(file)

If you are really hellbent on using a regular expression for this, the regex for the numbers 7 through 24 is simply (?:7|8|9|10|11|...|23|24) (it could be simplified to (?:[7-9]|1[0-9]|2[0-4]) but here, it's probably not worth the effort).

os.listdir will return the matched files sorted alphabetically; if you require a different sort order, probably use os.scandir and supply your own sort function.

my_files = []
for file in os.scandir(directory):
    m = re.match(r'stc_(\d ).*_alpha\.mat', file)
    if m:
        # Maybe you only care about a particular range for k?
        kcurr = int(m.group(1))
        if kcurr < 7 or kcurr > 24:
            continue
        my_files.append(kcurr, file))
my_files = [x[1] for x in sorted(my_files)]

Here, we use the regex grouping parentheses to extract a tuple containing the sort key and the file name, then discard the sort keys after sorting, keeping only the sorted list of matching files. (See also Schwarzian transform.)

The if clause which skips values lower than 7 or bigger than 24 demonstrates how to only cover specific numbers; if you don't need that, obviously take it out.

Hitting the disk is on the order of 1,000 times slower than processing data in memory, so you generally want to avoid repeatedly accessing the disk.

  • Related