Home > Net >  How to identify unique elements in two dataframes and append with a new row
How to identify unique elements in two dataframes and append with a new row

Time:11-23

I am trying to write a function that takes in two dataframes with a different number of rows, finds the elements that are unique to each dataframe in the first column, and then appends a new row that only contains the unique element to the dataframe where it does not exist. For example:

>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d1)
>>> df1
    col1  col2
0     1     3
1     2     4
2     5     6

>>> d2 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df2 = pd.DataFrame(data=d2)
>>> df2
   col1  col2
0     1     3
1     2     4
2     6     7

>>> standarized_unique_elems(df1, df2)    
>>> df1
   col1  col2
0     1     3
1     2     4
2     5     6
3     6    NaN

>>> df2
   col1  col2
0     1     3
1     2     4
2     6     7
3     5    NaN

Before posting this question, I gave it my best shot, but cant figure out a good way to append a new row at the bottom of each dataframe with the unique element. Here is what I have so far:

def standardize_shape(df1, df2):
    unique_elements = list(set(df1.iloc[:, 0]).symmetric_difference(set(df2.iloc[:, 0])))
    for elem in unique_elements:
        if elem not in df1.iloc[:, 0].tolist():
            # append a new row with the unique element with rest of values NaN
        if elem not in df2.iloc[:, 0].tolist():
            # append a new row with the unique element with rest of values NaN
    return (df1, df2)

I am still new to Pandas, so any help would be greatly appreciated!

CodePudding user response:

We can do

out1 = pd.concat([df1,pd.DataFrame({'col1':df2.loc[~df2.col1.isin(df1.col1),'col1']})])
Out[269]: 
   col1  col2
0     1   3.0
1     2   4.0
2     5   6.0
2     6   NaN

#out2 = pd.concat([df2,pd.DataFrame({'col1':df1.loc[~df1.col1.isin(df2.col1),'col1']})])
  • Related