Home > Blockchain >  How to concat multiple dataframes dynamically, with different columns?
How to concat multiple dataframes dynamically, with different columns?

Time:12-30

I have one main DF, with all the columns. Often I need to concat a new DF, which sometimes has the same number of columns, but sometimes has less. The idea is to have the following logic:

  • standard DF1 already exists, we should append new data to it
  • if headers are the same --> concat in a standard way
  • if new DF2 doesn't have col X --> create new col X in DF2 with NaN values, then append it to main DF1
  • if new DF3 has new column Y, that doesn't exist yet in DF1, then create this column for old DF1 and add NaN values to it

Short version: I would like to concat multiple dataframes, and in case that there is a new column, expand existing DF with NaN values for that column.

Example of 3 different DFs:

separated DFs

Example of wanted outcome:

Concatenated DFs

My question is: is it possible to do that dynamically, without specifying columns every time.

CodePudding user response:

pd.concat will cope with that without any adjustment, consider following simple example

import pandas as pd
df1 = pd.DataFrame({"City":["Berlin","Paris"],"Country":["DE","FR"],"Phone":[111,122]})
df2 = pd.DataFrame({"City":["Amstera","Copenhag"],"Country":["NL","DK"]})
df3 = pd.DataFrame({"City":["Vienna","Madrid"],"Country":["AT","ES"],"Phone":[222,343],"Street":["Leoplstr","Avenia"]})
df = pd.concat([df1,df2,df3])
print(df)

gives output

       City Country  Phone    Street
0    Berlin      DE  111.0       NaN
1     Paris      FR  122.0       NaN
0   Amstera      NL    NaN       NaN
1  Copenhag      DK    NaN       NaN
0    Vienna      AT  222.0  Leoplstr
1    Madrid      ES  343.0    Avenia

Note: I altered slightly sample data for brevity sake.

(tested in pandas 1.5.2)

  • Related