Home > Blockchain >  Union of two DataFrames with different column and row indexes updating existing values - Pandas
Union of two DataFrames with different column and row indexes updating existing values - Pandas

Time:03-22

First of all I'll make an example of the result I want to obtain. I initially have two DataFrames with, in general different column indexes and row indexes and eventually with different rows and columns number (even if in the example below are both 3x3):

    Dataframe1 |     Dataframe2
    A   B   C  |      B   D   F
A   x   x   x  |  A   y   y   y
D   x   x   x  |  B   y   y   y
E   x   x   x  |  E   y   y   y

And I want the following result:

    Result
   A   B   C   D   F
A  x   y   x   y   y
B  -   y   -   y   y
D  x   x   x   -   -
E  x   y   x   y   y

Note the solution has the following characteristics:

  • the resulting data frame contains all the rows and columns of both dataframe1 and dataframe2
  • where values overlap, dataframe2 updates dataframe1 (e.g. in positions [BA] and [BE] of result data frame, where in the same poistions in dataframe1 there was x, now there's y)
  • Where missing values occur (here replaced by a dash -) a default value is inserted (like NaN)
  • The indexes names are preserved in the result table (but the alphabetical sorting is not necessary)

My questions are:

  • Is it possible to do that with pandas? If yes, how? I've tried many different things but none of them worked 100%, but since I'm very new to pandas I might not know the proper way to do it.
  • If pandas is not the right or easiest way to do it, is there another way that you would recommend for doing that (maybe using matrices, dictionaries, ...)

Thank you.

CodePudding user response:

try this:

data1 = {'A': {'A': 'x', 'D': 'x', 'E': 'x'},
         'B': {'A': 'x', 'D': 'x', 'E': 'x'},
         'C': {'A': 'x', 'D': 'x', 'E': 'x'}}
df1 = pd.DataFrame(data1)
print(df1)
>>>
    A   B   C
A   x   x   x
D   x   x   x
E   x   x   x

data2 = {'B': {'A': 'y', 'B': 'y', 'E': 'y'},
         'D': {'A': 'y', 'B': 'y', 'E': 'y'},
         'F': {'A': 'y', 'B': 'y', 'E': 'y'}}
df2 = pd.DataFrame(data2)
print(df2)
>>>
    B   D   F
A   y   y   y
B   y   y   y
E   y   y   y

res = df1.combine_first(df2)
print(res)
>>>
    A   B   C   D   F
A   x   y   x   y   y
B   NaN y   NaN y   y
D   x   x   x   NaN NaN
E   x   y   x   y   y

CodePudding user response:

try another:

cols = df1.columns.append(df2.columns).unique().sort_values()
idx = df1.index.append(df2.index).unique().sort_values()
res = df1.reindex(index=idx, columns=cols)
res.update(df2)
print(res)
>>>
    A   B   C   D   F
A   x   y   x   y   y
B   NaN y   NaN y   y
D   x   x   x   NaN NaN
E   x   y   x   y   y
  • Related