Home > database >  Dropping rows by index that doesn't exist in other dataframe
Dropping rows by index that doesn't exist in other dataframe

Time:11-05

If i have 2 dataframe, let's say dfA like this:

    hour    distance    short_summary
1   5       2.02        Overcast
2   7       1.16        Overcast
3   3       1.35        Partly Cloudy
4   12      1.17        Overcast
5   22      1.80        Overcast
6   9       1.72        Partly Cloudy
7   18      1.09        Partly Cloudy

and dfB like this:

          price
1         22.5
3         8.5
5         14.0
6         7.0
7         9.5

How do i remove the rows in dfA that have index which doesn't exist in dfB? The final dfA should look like this:

    hour    distance    short_summary
1   5       2.02        Overcast
3   3       1.35        Partly Cloudy
5   22      1.80        Overcast
6   9       1.72        Partly Cloudy
7   18      1.09        Partly Cloudy

CodePudding user response:

dfA = dfA.loc[dfB.index,:] does what you want. Here's a demonstration.

dfA = pd.DataFrame(
    {'hour': {1: 5, 2: 7, 3: 3, 4: 12, 5: 22, 6: 9, 7: 18},
     'distance': {1: 2.02, 2: 1.16, 3: 1.35, 4: 1.17, 5: 1.8, 6: 1.72, 7: 1.09},
     'short_summary': {1: 'Overcast',
      2: 'Overcast',
      3: 'Partly_Cloudy',
      4: 'Overcast',
      5: 'Overcast',
      6: 'Partly_Cloudy',
      7: 'Partly_Cloudy'}})
dfB = pd.DataFrame(
    {'price': {1: 22.5, 3: 8.5, 5: 14.0, 6: 7.0, 7: 9.5}})

dfA = dfA.loc[dfB.index,:]

The result:

   hour  distance  short_summary
1     5      2.02       Overcast
3     3      1.35  Partly_Cloudy
5    22      1.80       Overcast
6     9      1.72  Partly_Cloudy
7    18      1.09  Partly_Cloudy

CodePudding user response:

Considering that the dataframes are, respectively, dfA, and dfB, there are various ways to do that.

Will leave four options below.


Option 1

Using pandas.Index.isin and pandas.Index

dfA = dfA[dfA.index.isin(dfB.index)]

[Out]:

   hour  distance  short_summary
0     5      2.02       Overcast
1     7      1.16       Overcast
2     3      1.35  Partly Cloudy
3    12      1.17       Overcast
4    22      1.80       Overcast

Option 2

Using pandas.DataFrame.loc and pandas.Index

dfA = dfA.loc[dfB.index]

[Out]:

   hour  distance  short_summary
0     5      2.02       Overcast
1     7      1.16       Overcast
2     3      1.35  Partly Cloudy
3    12      1.17       Overcast
4    22      1.80       Overcast

Option 3

Using numpy.isin and pandas.Index

import numpy as np

dfA = dfA[np.isin(dfA.index, dfB.index)]

[Out]:

   hour  distance  short_summary
0     5      2.02       Overcast
1     7      1.16       Overcast
2     3      1.35  Partly Cloudy
3    12      1.17       Overcast
4    22      1.80       Overcast

Option 4

Using pandas.Index in a list comprehension

dfA = dfA[[i in dfB.index for i in dfA.index]]

[Out]:

   hour  distance  short_summary
0     5      2.02       Overcast
1     7      1.16       Overcast
2     3      1.35  Partly Cloudy
3    12      1.17       Overcast
4    22      1.80       Overcast
  • Related