I have one main DF, with all the columns. Often I need to concat a new DF, which sometimes has the same number of columns, but sometimes has less. The idea is to have the following logic:
- standard DF1 already exists, we should append new data to it
- if headers are the same --> concat in a standard way
- if new DF2 doesn't have col X --> create new col X in DF2 with NaN values, then append it to main DF1
- if new DF3 has new column Y, that doesn't exist yet in DF1, then create this column for old DF1 and add NaN values to it
Short version: I would like to concat multiple dataframes, and in case that there is a new column, expand existing DF with NaN values for that column.
Example of 3 different DFs:
Example of wanted outcome:
My question is: is it possible to do that dynamically, without specifying columns every time.
CodePudding user response:
pd.concat
will cope with that without any adjustment, consider following simple example
import pandas as pd
df1 = pd.DataFrame({"City":["Berlin","Paris"],"Country":["DE","FR"],"Phone":[111,122]})
df2 = pd.DataFrame({"City":["Amstera","Copenhag"],"Country":["NL","DK"]})
df3 = pd.DataFrame({"City":["Vienna","Madrid"],"Country":["AT","ES"],"Phone":[222,343],"Street":["Leoplstr","Avenia"]})
df = pd.concat([df1,df2,df3])
print(df)
gives output
City Country Phone Street
0 Berlin DE 111.0 NaN
1 Paris FR 122.0 NaN
0 Amstera NL NaN NaN
1 Copenhag DK NaN NaN
0 Vienna AT 222.0 Leoplstr
1 Madrid ES 343.0 Avenia
Note: I altered slightly sample data for brevity sake.
(tested in pandas 1.5.2)