I have 300 time series CSV's that do not contain IDs within them contained in a file called all_dataframes
, and I am trying to go about adding a the name of the CSV as a column 'name'
For example,
df1.csv
date value
2020-01-01 35
2020-01-02 40
2020-01-03 45
df2.csv
date value
2022-03-01 15
2022-03-02 25
2022-03-03 20
The expected output would be:
df1.csv
name date value
df1 2020-01-01 35
df1 2020-01-02 40
df1 2020-01-03 45
df2.csv
name date value
df2 2022-03-01 15
df2 2022-03-02 25
df2 2022-03-03 20
I have tried using the below code as a way to do it, but it keeps throwing errors at me of EmptyDataError: No columns to parse from file
even though I am in the correct working directory.
import pandas as pd
import glob
import os.path
# Create a list of all CSV files in folder
files = glob.glob("*.csv")
# Creats list of filenames for appending to df
filenames = []
# Doing the thing
for csv in files:
df = pd.read_csv(csv)
df['name'] = os.path.basename(csv)
filenames.append(df)
Is there a more simple way to do this or should I try to fix the working directory error?
I found a recommended question very similar to this, however, it is in R and that's not a language I am comfortable with at this point. R: Set column name from dataframe name in list of dataframes
CodePudding user response:
You have probably an empty file? From the documentation:
Exception that is thrown in pd.read_csv (by both the C and Python engines) when empty data or header is encountered
Tip: replace glob
by pathlib
:
import pandas as pd
import pathlib
for csvfile in pathlib.Path('.').glob('*.csv'):
df = pd.read_csv(csvfile)
df.insert(0, 'name', csvfile.stem)
df.to_csv(csvfile, index=False)
Output:
>>>