Python Pandas - import all CSV files in folder, only picking up 1 file-CodePudding

Hello StackOverflow community!

Struggling new python person here. I have code that did work, until I added more to it and I'm trying to figure out what I did wrong to screw it up. I'm trying to import a file, read the file name, remove columns, reset the index, fill a column with the filename (I need that info later on) and then move on to the next file.

For some reason, it's only importing the LAST file in the folder. I know I've done something wrong.

Any help would be very much appreciated

csvPath = "blahblah"

dfData = pd.DataFrame(['NTLogin', 'Date', '', 'FileName'])

for f in glob.glob(csvPath   "\*.csv"):
        df = pd.read_csv(f)
        filename = (os.path.basename(f))
        df.drop(df.columns[[0,1,3]], axis=1, inplace=True)
        df['ID'] = df['ID'].str.upper()
        df = df.set_index('ID').stack().reset_index()
        df['Filename'] = filename
        dfData = pd.concat([df, dfData], ignore_index=True)

CodePudding user response：

It is processing all the CSVs, when concatenating you are not using your base dataframe (dfData) and just using the the new dataframe (df).

Also considering the Filename, it will be overwritten everytime. Have it at df to avoid this:

df['Filename'] = filename
dfData = pd.concat([dfData, df], ignore_index=True)

List method

as suggested by pyaj in the comments, you can also use lists to achieve the same thing.

It will look like this:

csvPath = "blahblah"

df_list = []

for f in glob.glob(csvPath   "\*.csv"):
        df = pd.read_csv(f)
        filename = (os.path.basename(f))
        df.drop(df.columns[[0,1,3]], axis=1, inplace=True)
        df['ID'] = df['ID'].str.upper()
        df = df.set_index('ID').stack().reset_index()
        df['Filename'] = filename

        df_list.append(df)

dfData = pd.concat(df_list, ignore_index=True)

You can also check the list to see if each individual dataframe is correct.