I have this dataframe
:
x y z parameter
0 26 24 25 Age
1 35 37 36 Age
2 57 52 54.5 Age
3 160 164 162 Hgt
4 182 163 172.5 Hgt
5 175 167 171 Hgt
6 95 71 83 Wgt
7 110 68 89 Wgt
8 89 65 77 Wgt
I'm using pandas
to get this final result:
x y parameter
0 160 164 Hgt
1 182 163 Hgt
2 175 167 Hgt
I'm using groupby()
to extract and isolate rows based on same parameter Hgt from the original dataframe
First, I added a column to set it as an index
:
df = df.insert(0,'index', [count for count in range(df.shape[0])], True)
And the dataframe
came out like this:
index x y z parameter
0 0 26 24 25 Age
1 1 35 37 36 Age
2 2 57 52 54.5 Age
3 3 160 164 162 Hgt
4 4 182 163 172.5 Hgt
5 5 175 167 171 Hgt
6 6 95 71 83 Wgt
7 7 110 68 89 Wgt
8 8 89 65 77 Wgt
Then, I used the following code to group based on index
and extract the columns I need:
df1 = df.groupby('index')[['x', 'y','parameter']]
And the output was:
x y parameter
0 26 24 Age
1 35 37 Age
2 57 52 Age
3 160 164 Hgt
4 182 163 Hgt
5 175 167 Hgt
6 95 71 Wgt
7 110 68 Wgt
8 89 65 Wgt
After that, I used the following code to isolate only Hgt values:
df2 = df1[df1['parameter'] == 'Hgt']
When I ran df2
, I got an error saying:
IndexError: Column(s) ['x', 'y', 'parameter'] already selected
Am I missing something here? What to do to get the final result?
CodePudding user response:
Do you really need groupby
?
>>> df.loc[df['parameter'] == 'Hgt', ['x', 'y', 'parameter']].reset_index(drop=True)
x y parameter
0 160 164 Hgt
1 182 163 Hgt
2 175 167 Hgt
CodePudding user response:
Because you asked what you did wrong, let me point to useless/bad code.
Without any judgement (this is just to help you improve future code), almost everything is incorrect. It feels like a succession of complicated ways to do useless things. Let me give some details:
df = df.insert(0,'index', [count for count in range(df.shape[0])], True)
This seems a very convoluted way to do df.reset_index()
. Even [count for count in range(df.shape[0])]
could be have been simplified by using range(df.shape[0])
directly.
But this step is not even needed for a groupby
as you can group by index level:
df.groupby(level=0)
But... the groupby
is useless anyways as you only have single membered groups.
Also, when you do:
df1 = df.groupby('index')[['x', 'y','parameter']]
df1
is not a dataframe but a DataFrameGroupBy
object. Very useful to store in a variable when you know what you're doing, this is however causing the error in your case as you thought this was a DataFrame
. You need to apply an aggregation or transformation method of the DataFrameGroupBy
object to get back a DataFrame
, which you didn't (likely because, as seen above, there isn't much interesting to do on dogma membered groups).
So when you run:
df1[df1['parameter'] == 'Hgt']
again, all is wrong as df1['parameter']
is equivalent to df.groupby('index')[['x', 'y','parameter']]['parameter']
(the cause of the error as you select twice 'parameter'). Even if you removed this error, the equality comparison would give a single True
/False
as you still have your DataFrameGroupBy
and not a DataFrame
, and this would incorrectly try to subselect an inexistent column of the DataFrameGroupBy
.
I hope it helped!