From my understanding, there are two ways to subset a dataframe in pandas:
a) df['columns']['rows']
b) df.loc['rows', 'columns']
I was following a guided case study, where the instruction was to select the first and last n rows of a column in a dataframe. The solution used Method A, whereas I tried Method B.
My method wasn't working and I couldn't for the life of me figure out why.
I've created a simplified version of the dataframe...
male = [6, 14, 12, 13, 21, 14, 14, 14, 14, 18]
female = [9, 11, 6, 10, 11, 13, 12, 11, 9, 11]
df = pd.DataFrame({'Male': male,
'Female': female},
index = np.arange(1, 11))
df['Mean'] = df[['Male', 'Female']].mean(axis = 1).round(1)
df
Selecting the first two rows, works fine for method a and b
print('Method A: \n', df['Mean'][:2])
print('Method B: \n', df.loc[:2, 'Mean'])
Method A:
1 7.5
2 12.5
Method B:
1 7.5
2 12.5
But not for selecting the last 2 rows, it doesn't work the same. Method A returns the last two rows as it should. Method B (.loc) doesn't, it returns the whole dataframe. Why is this and how do I fix it?
print('Method A: \n', df['Mean'][-2:])
print('Method B: \n', df.loc[-2:, 'Mean'])
Method A:
9 11.5
10 14.5
Method B:
1 7.5
2 12.5
3 9.0
4 11.5
5 16.0
6 13.5
7 13.0
8 12.5
9 11.5
10 14.5
CodePudding user response:
You could use .index[-2:]
to get the index of the lasts two rows which are 9 and 10 instead of only -2:
. Here is some reproducible code:
male = [6, 14, 12, 13, 21, 14, 14, 14, 14, 18]
female = [9, 11, 6, 10, 11, 13, 12, 11, 9, 11]
df = pd.DataFrame({'Male': male,
'Female': female},
index = np.arange(1, 11))
df['Mean'] = df[['Male', 'Female']].mean(axis = 1).round(1)
print('Method B: \n', df.loc[df.index[-2:], 'Mean'])
Output:
Method B:
9 11.5
10 14.5
Name: Mean, dtype: float64
As you can see it returns the two last rows of your dataframe.
CodePudding user response:
Also you can get with iloc
and tail
method, like that :
df['Mean'][-2:]
df['Mean'].iloc[-2:]
df['Mean'].tail(2)
We don't usually use loc
for this. iloc
or other methods are easier to use. But if you want to use it could be like this:
df.loc[df.index[-2:],'Mean']