Home > Net >  How to extract data from a filename in python? - convert file name to string?
How to extract data from a filename in python? - convert file name to string?

Time:02-22

I am trying to extract the meta data for some experiments I'm helping conduct at school. We are naming our data files something like this:

name_date_sample_environment_run#.csv

What I need to do is write a function that separates each piece to a list that'll be output like this:

['name', 'date', 'sample', 'environment', 'run#']

Though I haven't quite figured it out. I think I need to figure out how to load the file, convert the name to a string, then use a delimiter for each underscore to separate each into the given list. I don't know how to load the file so that I can convert it to a string. Any help will be appreciated!

P.S - I will eventually need to figure out a way to save this data into a spreadsheet so we can see how many experiments we do with certain conditions, who performed them, etc. but I can figure that out later. Thanks!

CodePudding user response:

If you're just asking how to break down the string into all the components separated by an underscore, then the easiest way would be using the split function.

x = 'name_date_sample_environment_run#.csv'
y = x.split('_')

# y = ['name', 'date', 'sample', 'environment', 'run#.csv']

The split function simply breaks down the string every time it sees the underscore. If you want to remove the .csv part from 'run#.csv' then you can process the original string to remove the last 4 characters.

x = 'name_date_sample_environment_run#.csv'
x = x[:-4]
y = x.split('_')

# y = ['name', 'date', 'sample', 'environment', 'run#]

CodePudding user response:

If all your files are structured, and in the same folder you can do this way:

import os 
files = os.listdir('.') #insert folder path

structured_files = {}
for file in files:
    name, date, sample, environment = file.split('_')
    structured_files.append({'name':name, 'date':date, 'sample':sample, 'env':env})

Then you'll have a structure dict with your file info. If you want to, you can import into pandas, and save to an excel sheet:

import os 
import pandas as pd

files = os.listdir('.') #insert folder path

structured_files = {}
for file in files:
    name, date, sample, environment = file.split('_')
    structured_files.append({'name':name, 'date':date, 'sample':sample, 'env':env})

pd.from_dict(structured_files).to_excel('files.xlsx')
  • Related