Home > Blockchain >  How to get rows from a dataframe that are not joined when merging with other dataframe in pandas
How to get rows from a dataframe that are not joined when merging with other dataframe in pandas

Time:03-27

I am trying to make 2 new dataframes by using 2 given dataframe objects:

DF1 = id  feature_text    length
       1  "example text"  12
       2  "example text2" 13
       ....
       ....

DF2 = id  case_num
       3         0
       ....
       ....

As you could see, both df1 and df2 have column called "id". However, the df1 has all id values, where df2 only has some of them. I mean, df1 has 3200 rows, where each row has a unique id value (1~3200), however, df2 has only some of them (i.e. id=[3,7,20,...]).

What I want to do is 1) get a merged dataframe which contains all rows that have the id values which are included in both df1 and df2, and 2) get a dataframe, which contains the rows in the df1, which have id values that are not included in the df2.

I was able to find a solution for 1), however, have no idea how to do 2).

Thanks.

CodePudding user response:

For the first case, you could use inner merge:

out = df1.merge(df2, on='id')

For the second case, you could use isin, with negation operator, so that we filter out the rows in df1 that have ids that also exist in df2:

out = df1[~df1['id'].isin(df2['id'])]
  • Related