A bit of background on the task that this is task: I have appended a series of csv files to each other based on there date over the last week, which is given me the file newData.csv. I need to append this to the previous data stored in oldData.xlsx so that it sits beneath to older data.
Code for generating the newData
df1 = pd.read_csv(fName0)
#subtracted_date = pd.to_datetime(openDate) - timedelta(days=8)
#subtracted_date = subtracted_date.strftime("%d/%m/%Y")
Previous_Date = datetime.datetime.today() - datetime.timedelta(days=7)
Previous_Date_Formatted = Previous_Date.strftime ('%#d/%#m/%Y') # format the date to ddmmyyyy
print(Previous_Date_Formatted)
df1.insert(0,'Date','')
df1['Date'] = Previous_Date_Formatted
df_Kwai = df1[df1['Portfolio'].str.contains("Kwai")==True]
df_Kwai.to_csv('newData.csv', mode='a', index = False, header=False)
The csv files that were downloaded do not natively come with a Date column so I have added one based on the timedelta function. The complete csv of the last 7 days has the same exact columns as the oldData file that it needs to join.
The oldData file is then opened into a new dataframe and I have then attempted to append them together.
newData = pd.read_csv (r'newData.csv')
oldData = pd.read_excel(r"oldData.xlsx")
combinedData = pd.concat([oldData, newData], ignore_index=True)
combinedData.to_excel (r'Kwai-All Data.xlsx', index = None, header=True)
print("Kwai excel file created successfully")
This does append the data together however the data has been pushed far over to a new column then appended so I have a collection of empty cells before the newData. I have created a simplistic representation of the end result below.
Date col1 col2 col3 col4 col5 02/01/2022 0 0 0 vvv
01/01/2022 0 0 0 abc def
01/01/2022 1 1 1 ggg fff
01/01/2022 2 2 4 fff ooo
01/01/2022 3 3 5 hhh uuu
02/01/2022 0 0 0 rrr
03/01/2022 0 0 0 sss
I have tried the same code with some placeholder files which allows it to behave normally. My guess is that the error is in the creation of the newData.csv file but I cannot find where the error is steaming from.
Any help would be greatly appreciated as I'm still fairly new to pandas.
CodePudding user response:
I am not quite sure, but from the overview of the result it seems the dataframes have different column names (pandas appends based on column names). If the columns appear in the same order in both dataframes (assuming they have also the same number of columns, which it doesn't seem so), you could rename the columns of the second one like this:
newData.columns = oldData.columns
and then append. Hope it helps!