Merging df in python-CodePudding

Say I have two DataFrames

df1 = pd.DataFrame({'A':[1,2], 'B':[3,4]}, index = [0,1])
df2 = pd.DataFrame({'B':[8,9], 'C':[10,11]}, index = [1,2])

I want to merge so that any values in df1 are overwritten in there is a value in df2 at that location and any new values in df2 are added including the new rows and columns.

The result should be:

   A  B  C
0  1  3  nan
1  2  8  10
2 nan 9  11

I've tried combine_first but that causes only nan values to be overwritten updated has the issue where new rows are created rather than overwritten merge has many issues.

I've tried writing my own function

def take_right(df1, df2, j, i):
    print (df1)
    print (df2)
    try:
        s1 = df1[j][i]
    except:
        s1 = np.NaN
    try:
        s2 = df2[j][i]
    except:
        s2 = np.NaN
    
    if math.isnan(s2):
        #print(s1)
        return s1
    else:
       # print(s2)
        return s2
    
def combine_df(df1, df2):
    
    rows = (set(df1.index.values.tolist()) | set(df2.index.values.tolist()))
    #print(rows)
    columns = (set(df1.columns.values.tolist()) | set(df2.columns.values.tolist()))
    #print(columns)
    df = pd.DataFrame()
    #df.columns = columns
    for i in rows:
        #df[:][i]=[]
        for j in columns:
                
                df = df.insert(int(i), j, take_right(df1,df2,j,i), allow_duplicates=False)
   # print(df)
                
    return df

This won't add new columns or rows to an empty DataFrame.

Thank you!!

CodePudding user response：

One approach is to create an empty output dataframe with the union of columns and indices from df1 and df2 and then use the df.update method to assign their values into the out_df

import pandas as pd

df1 = pd.DataFrame({'A':[1,2], 'B':[3,4]}, index = [0,1])
df2 = pd.DataFrame({'B':[8,9], 'C':[10,11]}, index = [1,2])


out_df = pd.DataFrame(
    columns = df1.columns.union(df2.columns),
    index = df1.index.union(df2.index),
)
out_df.update(df1)
out_df.update(df2)
out_df

CodePudding user response：

Why does combine_first not work?

df = df2.combine_first(df1)
print(df)

Output:

     A  B     C
0  1.0  3   NaN
1  2.0  8  10.0
2  NaN  9  11.0