pandas: append rows to another dataframe under the similar row based on column condition-CodePudding

I have two dataframes as follows,

import pandas as pd
d1 ={'col1': ['I ate dinner','I ate dinner', 'the play was inetresting','the play was inetresting'],
 'col2': ['min', 'max', 'mid','min'],
 'col3': ['min', 'max', 'max','max']}

d2 ={'col1': ['I ate dinner',' the glass is shattered', 'the play was inetresting'],
     'col2': ['min', 'max', 'max'],
     'col3': ['max', 'min', 'mid']}

df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

I have created a column in df2 called 'exist' and added values (true, false) based on whether the sentences in df2.col1 exist in df1.col1:

common = df1.merge(df2,on=['col1'])
index_list = df2[(~df2.col1.isin(common.col1))].index.to_list()
df2['exist'] = ' '
df2.loc[index_list, 'exist'] = 'false'
df2.loc[df2["exist"] == " ",'exist'] = 'true'

what I would like to do now, is that if the value in the exist column == true, I would like to add that row under the similar row in df1. so the desired output should be:

output:
                       col1 col2 col3
0              I ate dinner  min  min
1              I ate dinner  max  max
2              I ate dinner  min  max
3  the play was inetresting  mid  max
4  the play was inetresting  min  max
5  the play was inetresting  max  mid

I guess I have to use np.where, but I am not sure how to formulate the append to get the desired output

CodePudding user response：

First idea is filter df2 values by df1.col1 and append to df1 by concat and then sorting by DataFrame.sort_values:

df = pd.concat([df1, df2[(df2.col1.isin(df1.col1))]]).sort_values('col1', ignore_index=True)
print (df)
                       col1 col2 col3
0              I ate dinner  min  min
1              I ate dinner  max  max
2              I ate dinner  min  max
3  the play was inetresting  mid  max
4  the play was inetresting  min  max
5  the play was inetresting  max  mid

If need only common values in both DataFrames is possible filter by numpy.intersect1d:

common = np.intersect1d(df1['col1'], df2['col1'])

df = (pd.concat([df1[df1.col1.isin(common)],
                 df2[df2.col1.isin(common)]])
                     .sort_values('col1', ignore_index=True))
print (df)

CodePudding user response：

IIUC, you want to add the matching row(s) and not necessarily rely on sorting.

df2b = df2.set_index('col1')

(df1
 .groupby('col1', as_index=False, group_keys=False)
 .apply(lambda d: pd.concat([d, df2b.loc[[d.name]].reset_index()]))
 .reset_index(drop=True)
)

output:

                       col1 col2 col3
0              I ate dinner  min  min
1              I ate dinner  max  max
2              I ate dinner  min  max
3  the play was inetresting  mid  max
4  the play was inetresting  min  max
5  the play was inetresting  max  mid