How to extract data from a filename in python? - convert file name to string?-CodePudding

I am trying to extract the meta data for some experiments I'm helping conduct at school. We are naming our data files something like this:

name_date_sample_environment_run#.csv

What I need to do is write a function that separates each piece to a list that'll be output like this:

['name', 'date', 'sample', 'environment', 'run#']

Though I haven't quite figured it out. I think I need to figure out how to load the file, convert the name to a string, then use a delimiter for each underscore to separate each into the given list. I don't know how to load the file so that I can convert it to a string. Any help will be appreciated!

P.S - I will eventually need to figure out a way to save this data into a spreadsheet so we can see how many experiments we do with certain conditions, who performed them, etc. but I can figure that out later. Thanks!

CodePudding user response：

If you're just asking how to break down the string into all the components separated by an underscore, then the easiest way would be using the split function.

x = 'name_date_sample_environment_run#.csv'
y = x.split('_')

# y = ['name', 'date', 'sample', 'environment', 'run#.csv']

The split function simply breaks down the string every time it sees the underscore. If you want to remove the .csv part from 'run#.csv' then you can process the original string to remove the last 4 characters.

x = 'name_date_sample_environment_run#.csv'
x = x[:-4]
y = x.split('_')

# y = ['name', 'date', 'sample', 'environment', 'run#]

CodePudding user response：

If all your files are structured, and in the same folder you can do this way:

import os 
files = os.listdir('.') #insert folder path

structured_files = {}
for file in files:
    name, date, sample, environment = file.split('_')
    structured_files.append({'name':name, 'date':date, 'sample':sample, 'env':env})

Then you'll have a structure dict with your file info. If you want to, you can import into pandas, and save to an excel sheet:

import os 
import pandas as pd

files = os.listdir('.') #insert folder path

structured_files = {}
for file in files:
    name, date, sample, environment = file.split('_')
    structured_files.append({'name':name, 'date':date, 'sample':sample, 'env':env})

pd.from_dict(structured_files).to_excel('files.xlsx')