Home > other >  How would you tell which rows were dropped from the original dataframe and the current one?
How would you tell which rows were dropped from the original dataframe and the current one?

Time:11-05

I have 2 dataframes which are "exactly" the same. The difference between them is DF1 has 1000 rows and DF2 has 950rows. 50 Rows were dropped but want to know what. Essentially DF2 is a subset of DF1 but I need to know what was dropped by another service from elsewhere.

It would be easiest to return a 3rd dataframe(DF3) where it showed the ones that are dropped(50).

DF3(50 rows x 4 columns) = DF1 (1000 rows x 4 columns) - DF2 (950 rows x 4 columns)

The index is the UniqueID.

Thank you!!

CodePudding user response:

Use isin on the index:

df3 = df1[~df1.index.isin(df2.index)]

CodePudding user response:

Essentially DF2 is a subset of DF1

You're right so you can use difference from sets:

>>> df1.loc[df1.index.difference(df2.index)]

Example:

>>> df1
          A
0  0.712755
1  0.400005
2  0.958937
3  0.112367
4  0.230177

>>> df2
          A
0  0.712755
1  0.400005
4  0.230177

>>> df1.loc[df1.index.difference(df2.index)]
          A
2  0.958937
3  0.112367
  • Related