I am trying to extract the meta data for some experiments I'm helping conduct at school. We are naming our data files something like this:
name_date_sample_environment_run#.csv
What I need to do is write a function that separates each piece to a list that'll be output like this:
['name', 'date', 'sample', 'environment', 'run#']
Though I haven't quite figured it out. I think I need to figure out how to load the file, convert the name to a string, then use a delimiter for each underscore to separate each into the given list. I don't know how to load the file so that I can convert it to a string. Any help will be appreciated!
P.S - I will eventually need to figure out a way to save this data into a spreadsheet so we can see how many experiments we do with certain conditions, who performed them, etc. but I can figure that out later. Thanks!
CodePudding user response:
If you're just asking how to break down the string into all the components separated by an underscore, then the easiest way would be using the split function.
x = 'name_date_sample_environment_run#.csv'
y = x.split('_')
# y = ['name', 'date', 'sample', 'environment', 'run#.csv']
The split function simply breaks down the string every time it sees the underscore. If you want to remove the .csv part from 'run#.csv' then you can process the original string to remove the last 4 characters.
x = 'name_date_sample_environment_run#.csv'
x = x[:-4]
y = x.split('_')
# y = ['name', 'date', 'sample', 'environment', 'run#]
CodePudding user response:
If all your files are structured, and in the same folder you can do this way:
import os
files = os.listdir('.') #insert folder path
structured_files = {}
for file in files:
name, date, sample, environment = file.split('_')
structured_files.append({'name':name, 'date':date, 'sample':sample, 'env':env})
Then you'll have a structure dict with your file info. If you want to, you can import into pandas, and save to an excel sheet:
import os
import pandas as pd
files = os.listdir('.') #insert folder path
structured_files = {}
for file in files:
name, date, sample, environment = file.split('_')
structured_files.append({'name':name, 'date':date, 'sample':sample, 'env':env})
pd.from_dict(structured_files).to_excel('files.xlsx')