Home > Blockchain >  What is causing this panda.concat to behave strangely
What is causing this panda.concat to behave strangely

Time:06-01

A bit of background on the task that this is task: I have appended a series of csv files to each other based on there date over the last week, which is given me the file newData.csv. I need to append this to the previous data stored in oldData.xlsx so that it sits beneath to older data.

Code for generating the newData

df1 = pd.read_csv(fName0)

    #subtracted_date = pd.to_datetime(openDate) - timedelta(days=8)
    #subtracted_date = subtracted_date.strftime("%d/%m/%Y")
    Previous_Date = datetime.datetime.today() - datetime.timedelta(days=7)
    Previous_Date_Formatted = Previous_Date.strftime ('%#d/%#m/%Y') # format the date to ddmmyyyy
    print(Previous_Date_Formatted)
    df1.insert(0,'Date','')
    df1['Date'] = Previous_Date_Formatted

    df_Kwai = df1[df1['Portfolio'].str.contains("Kwai")==True]
    df_Kwai.to_csv('newData.csv', mode='a', index = False, header=False)

The csv files that were downloaded do not natively come with a Date column so I have added one based on the timedelta function. The complete csv of the last 7 days has the same exact columns as the oldData file that it needs to join.

The oldData file is then opened into a new dataframe and I have then attempted to append them together.

newData = pd.read_csv (r'newData.csv')
oldData = pd.read_excel(r"oldData.xlsx")
combinedData = pd.concat([oldData, newData], ignore_index=True)
combinedData.to_excel (r'Kwai-All Data.xlsx', index = None, header=True)
print("Kwai excel file created successfully")

This does append the data together however the data has been pushed far over to a new column then appended so I have a collection of empty cells before the newData. I have created a simplistic representation of the end result below.

Date         col1  col2    col3    col4     col5     02/01/2022     0      0     0     vvv
01/01/2022    0     0       0      abc       def
01/01/2022    1     1       1      ggg       fff
01/01/2022    2     2       4      fff       ooo        
01/01/2022    3     3       5      hhh       uuu        
                                                     02/01/2022     0      0     0     rrr 
                                                     03/01/2022     0      0     0     sss

I have tried the same code with some placeholder files which allows it to behave normally. My guess is that the error is in the creation of the newData.csv file but I cannot find where the error is steaming from.

Any help would be greatly appreciated as I'm still fairly new to pandas.

CodePudding user response:

I am not quite sure, but from the overview of the result it seems the dataframes have different column names (pandas appends based on column names). If the columns appear in the same order in both dataframes (assuming they have also the same number of columns, which it doesn't seem so), you could rename the columns of the second one like this:

newData.columns = oldData.columns

and then append. Hope it helps!

  • Related