Hello StackOverflow community!
Struggling new python person here. I have code that did work, until I added more to it and I'm trying to figure out what I did wrong to screw it up. I'm trying to import a file, read the file name, remove columns, reset the index, fill a column with the filename (I need that info later on) and then move on to the next file.
For some reason, it's only importing the LAST file in the folder. I know I've done something wrong.
Any help would be very much appreciated
csvPath = "blahblah"
dfData = pd.DataFrame(['NTLogin', 'Date', '', 'FileName'])
for f in glob.glob(csvPath "\*.csv"):
df = pd.read_csv(f)
filename = (os.path.basename(f))
df.drop(df.columns[[0,1,3]], axis=1, inplace=True)
df['ID'] = df['ID'].str.upper()
df = df.set_index('ID').stack().reset_index()
df['Filename'] = filename
dfData = pd.concat([df, dfData], ignore_index=True)
CodePudding user response:
It is processing all the CSVs, when concatenating you are not using your base dataframe (dfData
) and just using the the new dataframe (df
).
Also considering the Filename
, it will be overwritten everytime.
Have it at df
to avoid this:
df['Filename'] = filename
dfData = pd.concat([dfData, df], ignore_index=True)
List method
as suggested by pyaj in the comments, you can also use lists to achieve the same thing.
It will look like this:
csvPath = "blahblah"
df_list = []
for f in glob.glob(csvPath "\*.csv"):
df = pd.read_csv(f)
filename = (os.path.basename(f))
df.drop(df.columns[[0,1,3]], axis=1, inplace=True)
df['ID'] = df['ID'].str.upper()
df = df.set_index('ID').stack().reset_index()
df['Filename'] = filename
df_list.append(df)
dfData = pd.concat(df_list, ignore_index=True)
You can also check the list to see if each individual dataframe is correct.