In a Panda's dataframe: I want to count how many of value 1
there is, in the stroke
coulmn, for each value in the Residence_type
column. In order to count how much 1
there is, I convert the stroke
column to a list, easier I think.
So for example, the value Rural
in Residence_type
has 300 times 1
in the stroke
column.. and so on.
The data is something like this:
Residence_type Stroke
0 Rural 1
1 Urban 1
2 Urban 0
3 Rural 1
4 Rural 0
5 Urban 0
6 Urban 0
7 Urban 1
8 Rural 0
9 Rural 1
The code:
grpby_variable = data.groupby('stroke')
grpby_variable['Residence_type'].tolist().count(1)
the final goal is to find the difference between the number of times the value 1
appears, for each value in the Residence_type column (rural or urban).
Am I doing it right? what is this error ?
CodePudding user response:
Not sure I got what you need done. Please try filter stroke==1, groupby and count;
df.query("Stroke==1").groupby('Residence_type')['Stroke'].agg('count').to_frame('Stroke_Count')
Stroke_Count
Residence_type
Rural 3
Urban 2
You could try the following if you need the differences between categories
df1 =df.query("Stroke==1").groupby('Residence_type')['Stroke'].agg('count').to_frame('Stroke_Count')
df1.loc['Diff'] = abs(df1.loc['Rural']-df1.loc['Urban'])
print(df1)
Stroke_Count
Residence_type
Rural 3
Urban 2
Diff 1
CodePudding user response:
Assuming that Stroke
only contains 1 or 0, you can do:
result_df = df.groupby('Residence_type').sum()
>>> result_df
Stroke
Residence_type
Rural 3
Urban 2
>>> result_df.Stroke['Rural'] - result_df.Stroke['Urban']
1