I am trying to write a function that takes in two dataframes with a different number of rows, finds the elements that are unique to each dataframe in the first column, and then appends a new row that only contains the unique element to the dataframe where it does not exist. For example:
>>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d1)
>>> df1
col1 col2
0 1 3
1 2 4
2 5 6
>>> d2 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df2 = pd.DataFrame(data=d2)
>>> df2
col1 col2
0 1 3
1 2 4
2 6 7
>>> standarized_unique_elems(df1, df2)
>>> df1
col1 col2
0 1 3
1 2 4
2 5 6
3 6 NaN
>>> df2
col1 col2
0 1 3
1 2 4
2 6 7
3 5 NaN
Before posting this question, I gave it my best shot, but cant figure out a good way to append a new row at the bottom of each dataframe with the unique element. Here is what I have so far:
def standardize_shape(df1, df2):
unique_elements = list(set(df1.iloc[:, 0]).symmetric_difference(set(df2.iloc[:, 0])))
for elem in unique_elements:
if elem not in df1.iloc[:, 0].tolist():
# append a new row with the unique element with rest of values NaN
if elem not in df2.iloc[:, 0].tolist():
# append a new row with the unique element with rest of values NaN
return (df1, df2)
I am still new to Pandas, so any help would be greatly appreciated!
CodePudding user response:
We can do
out1 = pd.concat([df1,pd.DataFrame({'col1':df2.loc[~df2.col1.isin(df1.col1),'col1']})])
Out[269]:
col1 col2
0 1 3.0
1 2 4.0
2 5 6.0
2 6 NaN
#out2 = pd.concat([df2,pd.DataFrame({'col1':df1.loc[~df1.col1.isin(df2.col1),'col1']})])