In the folder, there are "0_10_00.csv", "3_20_02.csv","1_00_00.csv"...... and one more CSV file whose name is "Ref.csv". In the same folder, there is a python file "sample.py" used to analyze these CSV files.
"0_10_00.csv" means this is the data gained when 0 hour,10 minutes, and 0 second had left since I started my experiment. In the same way, "3_20_02.csv" means this is the data gained when 3 hours,20 minutes, and 2 seconds had left since I started my experiment.
Then, in order to analyze these data all at once except for "Ref.csv", I wanted to get all the names of CSV files by using glob
import glob
files=glob.glob("\d_\d\d_\d\d\.csv")
print(files)
Output:↓
[]
the list was empty.
If I do this below, it seemingly works, but there is "Ref.csv" which I don't want to get.
import glob
files=glob.glob("*.csv")
print(files)
Output↓
['0_10_00.csv','3_20_02.csv','1_00_00.csv',......,'Ref.csv']
I want it to be more specific so that I can only get CSV files which have the same name structure as "0_10_00.csv".
CodePudding user response:
You probably need to use regex expression using re module to get the output you want, Here's an example using that:
import re
files = ['0_10_00.csv','3_20_02.csv','1_00_00.csv','sample.py','Ref.csv']
pattern = r"\d{1}_\d{2}_\d{2}.csv?"
filenames = []
for f in files:
if re.search(pattern, f):
filenames.append(f)
print(filenames)
Output: ['0_10_00.csv', '3_20_02.csv', '1_00_00.csv']
CodePudding user response:
How about just [\d_] \.csv
?
import re
items = ['0_10_00.csv','3_20_02.csv','1_00_00.csv',......,'Ref.csv']
[item for item in items if re.match(r'[\d_] \.csv', item)]
# ['0_10_00.csv', '3_20_02.csv', '1_00_00.csv']
If that's not strict enough we can do \d_\d\d_\d\d\.csv
:
>>> [item for item in items if re.match(r'\d_\d{2}_\d{2}\.csv', item)]
# ['0_10_00.csv', '3_20_02.csv', '1_00_00.csv']
Or you could do a crude isdigit
check after replacing what you don't want:
[item for item in ['0_10_00.csv','3_20_02.csv','1_00_00.csv','Ref.csv'] if item.replace('_','').replace('.csv','').isdigit()]
# ['0_10_00.csv', '3_20_02.csv', '1_00_00.csv']
CodePudding user response:
If you want to use glob the pattern you use needs to look more like this:
import glob
files = glob.glob("[0-9]_[0-9][0-9]_[0-9][0-9].csv")
print(files)
Output:
['1_00_00.csv', '3_20_02.csv', '0_10_00.csv']
As other have mentioned glob uses shell expansion not regex.