How to index by value, group by-CodePudding

I am trying analyze the rates of high blood pressure in different diagnosis of diabetes with that of healthy individuals. The output I am getting is this:

0     0.371132
8     0.752674
64    0.629022

The output I need is this

Diabetes_012    average HBP occurence
0               0.371132
2               0.752674
1               0.629022

Where the output index is the diabetes types and the value is the average occurrence of diabetes.

Here is the full code

import csv
import pandas as pd
import seaborn as sns
df = pd.read_csv ('diabetes_012_health_indicators_BRFSS2015.csv') 
df2=df.copy
pd.set_option('display.max_columns', None)
df
import matplotlib.pyplot as plt
grouped=df.groupby(['Diabetes_012'])['HighBP'].transform('mean').drop_duplicates()
print(grouped)

Here is the link to the dataset: https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset

CodePudding user response：

Don't use .transform, just grab the column (or columns) on which you want to perform the mean:

In [3]: df.groupby("Diabetes_012")[["HighBP"]].mean()
Out[3]:
                HighBP
Diabetes_012
0.0           0.371132
1.0           0.629022
2.0           0.752674

Example with multiple columns:

In [4]: df.groupby("Diabetes_012")[["HighBP", "BMI"]].mean()
Out[4]:
                HighBP        BMI
Diabetes_012
0.0           0.371132  27.742521
1.0           0.629022  30.724466
2.0           0.752674  31.944011