If i have 2 dataframe, let's say dfA like this:
hour distance short_summary
1 5 2.02 Overcast
2 7 1.16 Overcast
3 3 1.35 Partly Cloudy
4 12 1.17 Overcast
5 22 1.80 Overcast
6 9 1.72 Partly Cloudy
7 18 1.09 Partly Cloudy
and dfB like this:
price
1 22.5
3 8.5
5 14.0
6 7.0
7 9.5
How do i remove the rows in dfA that have index which doesn't exist in dfB? The final dfA should look like this:
hour distance short_summary
1 5 2.02 Overcast
3 3 1.35 Partly Cloudy
5 22 1.80 Overcast
6 9 1.72 Partly Cloudy
7 18 1.09 Partly Cloudy
CodePudding user response:
dfA = dfA.loc[dfB.index,:]
does what you want. Here's a demonstration.
dfA = pd.DataFrame(
{'hour': {1: 5, 2: 7, 3: 3, 4: 12, 5: 22, 6: 9, 7: 18},
'distance': {1: 2.02, 2: 1.16, 3: 1.35, 4: 1.17, 5: 1.8, 6: 1.72, 7: 1.09},
'short_summary': {1: 'Overcast',
2: 'Overcast',
3: 'Partly_Cloudy',
4: 'Overcast',
5: 'Overcast',
6: 'Partly_Cloudy',
7: 'Partly_Cloudy'}})
dfB = pd.DataFrame(
{'price': {1: 22.5, 3: 8.5, 5: 14.0, 6: 7.0, 7: 9.5}})
dfA = dfA.loc[dfB.index,:]
The result:
hour distance short_summary
1 5 2.02 Overcast
3 3 1.35 Partly_Cloudy
5 22 1.80 Overcast
6 9 1.72 Partly_Cloudy
7 18 1.09 Partly_Cloudy
CodePudding user response:
Considering that the dataframes are, respectively, dfA
, and dfB
, there are various ways to do that.
Will leave four options below.
Option 1
Using pandas.Index.isin
and pandas.Index
dfA = dfA[dfA.index.isin(dfB.index)]
[Out]:
hour distance short_summary
0 5 2.02 Overcast
1 7 1.16 Overcast
2 3 1.35 Partly Cloudy
3 12 1.17 Overcast
4 22 1.80 Overcast
Option 2
Using pandas.DataFrame.loc
and pandas.Index
dfA = dfA.loc[dfB.index]
[Out]:
hour distance short_summary
0 5 2.02 Overcast
1 7 1.16 Overcast
2 3 1.35 Partly Cloudy
3 12 1.17 Overcast
4 22 1.80 Overcast
Option 3
Using numpy.isin
and pandas.Index
import numpy as np
dfA = dfA[np.isin(dfA.index, dfB.index)]
[Out]:
hour distance short_summary
0 5 2.02 Overcast
1 7 1.16 Overcast
2 3 1.35 Partly Cloudy
3 12 1.17 Overcast
4 22 1.80 Overcast
Option 4
Using pandas.Index
in a list comprehension
dfA = dfA[[i in dfB.index for i in dfA.index]]
[Out]:
hour distance short_summary
0 5 2.02 Overcast
1 7 1.16 Overcast
2 3 1.35 Partly Cloudy
3 12 1.17 Overcast
4 22 1.80 Overcast