I want to know the regular expression of "0_10

In the folder, there are "0_10_00.csv", "3_20_02.csv","1_00_00.csv"...... and one more CSV file whose name is "Ref.csv". In the same folder, there is a python file "sample.py" used to analyze these CSV files.

"0_10_00.csv" means this is the data gained when 0 hour,10 minutes, and 0 second had left since I started my experiment. In the same way, "3_20_02.csv" means this is the data gained when 3 hours,20 minutes, and 2 seconds had left since I started my experiment.

Then, in order to analyze these data all at once except for "Ref.csv", I wanted to get all the names of CSV files by using glob

import glob
files=glob.glob("\d_\d\d_\d\d\.csv")
print(files)

Output:↓

[]

the list was empty.

If I do this below, it seemingly works, but there is "Ref.csv" which I don't want to get.

import glob
files=glob.glob("*.csv")
print(files)

Output↓

['0_10_00.csv','3_20_02.csv','1_00_00.csv',......,'Ref.csv']

I want it to be more specific so that I can only get CSV files which have the same name structure as "0_10_00.csv".

CodePudding user response：

You probably need to use regex expression using re module to get the output you want, Here's an example using that:

import re
files = ['0_10_00.csv','3_20_02.csv','1_00_00.csv','sample.py','Ref.csv']
pattern = r"\d{1}_\d{2}_\d{2}.csv?"
filenames = []
for f in files:
    if re.search(pattern, f):
        filenames.append(f)
print(filenames)

Output: ['0_10_00.csv', '3_20_02.csv', '1_00_00.csv']

CodePudding user response：

How about just [\d_] \.csv?

import re
items = ['0_10_00.csv','3_20_02.csv','1_00_00.csv',......,'Ref.csv']
[item for item in items if re.match(r'[\d_] \.csv', item)]
# ['0_10_00.csv', '3_20_02.csv', '1_00_00.csv']

If that's not strict enough we can do \d_\d\d_\d\d\.csv:

>>> [item for item in items if re.match(r'\d_\d{2}_\d{2}\.csv', item)]
# ['0_10_00.csv', '3_20_02.csv', '1_00_00.csv']

Or you could do a crude isdigit check after replacing what you don't want:

[item for item in ['0_10_00.csv','3_20_02.csv','1_00_00.csv','Ref.csv'] if item.replace('_','').replace('.csv','').isdigit()]
# ['0_10_00.csv', '3_20_02.csv', '1_00_00.csv']

CodePudding user response：

If you want to use glob the pattern you use needs to look more like this:

import glob
files = glob.glob("[0-9]_[0-9][0-9]_[0-9][0-9].csv")
print(files)

Output:

['1_00_00.csv', '3_20_02.csv', '0_10_00.csv']

As other have mentioned glob uses shell expansion not regex.