Home > OS >  How would I set the name of a csv as column in dataframe
How would I set the name of a csv as column in dataframe

Time:05-27

I have 300 time series CSV's that do not contain IDs within them contained in a file called all_dataframes, and I am trying to go about adding a the name of the CSV as a column 'name'

For example,

df1.csv

date        value
2020-01-01  35
2020-01-02  40
2020-01-03  45

df2.csv

date        value
2022-03-01  15
2022-03-02  25
2022-03-03  20

The expected output would be:

df1.csv

name      date        value
df1       2020-01-01  35
df1       2020-01-02  40
df1       2020-01-03  45

df2.csv

name      date        value
df2       2022-03-01  15
df2       2022-03-02  25
df2       2022-03-03  20

I have tried using the below code as a way to do it, but it keeps throwing errors at me of EmptyDataError: No columns to parse from file even though I am in the correct working directory.

import pandas as pd
import glob
import os.path

# Create a list of all CSV files in folder
files = glob.glob("*.csv")

# Creats list of filenames for appending to df
filenames = []

# Doing the thing
for csv in files:
    df = pd.read_csv(csv)
    df['name'] = os.path.basename(csv)
    filenames.append(df)

Is there a more simple way to do this or should I try to fix the working directory error?

I found a recommended question very similar to this, however, it is in R and that's not a language I am comfortable with at this point. R: Set column name from dataframe name in list of dataframes

CodePudding user response:

You have probably an empty file? From the documentation:

pandas.errors.EmptyDataError

Exception that is thrown in pd.read_csv (by both the C and Python engines) when empty data or header is encountered

Tip: replace glob by pathlib:

import pandas as pd
import pathlib

for csvfile in pathlib.Path('.').glob('*.csv'):
    df = pd.read_csv(csvfile)
    df.insert(0, 'name', csvfile.stem)
    df.to_csv(csvfile, index=False)

Output:

>>>            
  • Related