Home > other >  Append filename to column header when reading multiple csv files
Append filename to column header when reading multiple csv files

Time:10-07

I want to read multiple .csv files and append the datetime part of their filename to the column header. Each csv file contains data acquired at a specific datetime. Each filename has the following format:

yyyy-mm-dd_hh-mm-ss_someothertext

Each file contains only one column of data.

I successfully import multiple files as a list of dataframes as follows:

import pandas as pd
import glob

path = r'C:\Users\...' #path
all_files = glob.glob(path   "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

I then concatenate the files into one dataframe such that each column contains the data from one of the files:

frame = pd.concat(li, axis=1, ignore_index=True)

However, this is where I lose the filename information. The column headers are now just a series of numbers. My question is: how can I append the datetime portion of each filename to its respective column header in frame?

The closest I have got is being able to append the whole filename, not just the datetime part, in a roundabout way by transposing frame, adding the whole filename as a new column, transposing back, then setting the filename row as the header row...

import os
frame=pd.DataFrame.transpose(frame)
frame['filename'] = os.path.basename(filename)
frame=pd.DataFrame.transpose(frame)
frame.reset_index(drop=True)
frame.columns = frame.iloc[6628] #row 6628 is where the row with the filenames ends up after transposing

This seems terribly inefficient though and ends up with the whole filename as the header rather than just the datetime part.

CodePudding user response:

This would be my suggested approach, squeezing the DataFrame and using Regex:

import re
import os
import glob
import pandas as pd

path = 'C:\Users\....'
files = glob.glob(f'{path}\*.csv')

li = []

for file in files:
     name = os.path.basename(file)  # get filename
     date = re.search(r'\d{4}-\d{2}-\d{2}', name).group(0)  # extract yyyy-mm-dd from filename
     # read file, squeeze to Series, rename to date
     li.append(pd.read_csv(file, index_col=None, header=0, squeeze=True).rename(date))  

frame = pd.concat(li, axis=1, ignore_index=False)
  • Related