Home > Software design >  Why doesn't .loc reverse slice correctly?
Why doesn't .loc reverse slice correctly?

Time:12-25

From my understanding, there are two ways to subset a dataframe in pandas:

a) df['columns']['rows'] b) df.loc['rows', 'columns']

I was following a guided case study, where the instruction was to select the first and last n rows of a column in a dataframe. The solution used Method A, whereas I tried Method B.

My method wasn't working and I couldn't for the life of me figure out why.

I've created a simplified version of the dataframe...

male = [6, 14, 12, 13, 21, 14, 14, 14, 14, 18]
female = [9, 11, 6, 10, 11, 13, 12, 11, 9, 11]

df = pd.DataFrame({'Male': male,
                    'Female': female}, 
                    index = np.arange(1, 11))
df['Mean'] = df[['Male', 'Female']].mean(axis = 1).round(1)
df

Selecting the first two rows, works fine for method a and b

print('Method A: \n', df['Mean'][:2])
print('Method B: \n', df.loc[:2, 'Mean'])
Method A: 
1     7.5
2    12.5

Method B: 
1     7.5
2    12.5

But not for selecting the last 2 rows, it doesn't work the same. Method A returns the last two rows as it should. Method B (.loc) doesn't, it returns the whole dataframe. Why is this and how do I fix it?

print('Method A: \n', df['Mean'][-2:])
print('Method B: \n', df.loc[-2:, 'Mean'])
Method A: 
9     11.5
10    14.5

Method B: 
1      7.5
2     12.5
3      9.0
4     11.5
5     16.0
6     13.5
7     13.0
8     12.5
9     11.5
10    14.5

CodePudding user response:

You could use .index[-2:] to get the index of the lasts two rows which are 9 and 10 instead of only -2:. Here is some reproducible code:

male = [6, 14, 12, 13, 21, 14, 14, 14, 14, 18]
female = [9, 11, 6, 10, 11, 13, 12, 11, 9, 11]

df = pd.DataFrame({'Male': male,
                    'Female': female}, 
                    index = np.arange(1, 11))
df['Mean'] = df[['Male', 'Female']].mean(axis = 1).round(1)

print('Method B: \n', df.loc[df.index[-2:], 'Mean'])

Output:

Method B: 
9     11.5
10    14.5
Name: Mean, dtype: float64

As you can see it returns the two last rows of your dataframe.

CodePudding user response:

Also you can get with iloc and tail method, like that :

df['Mean'][-2:]
df['Mean'].iloc[-2:]
df['Mean'].tail(2)

We don't usually use loc for this. iloc or other methods are easier to use. But if you want to use it could be like this:

df.loc[df.index[-2:],'Mean']
  • Related