I am trying analyze the rates of high blood pressure in different diagnosis of diabetes with that of healthy individuals. The output I am getting is this:
0 0.371132
8 0.752674
64 0.629022
The output I need is this
Diabetes_012 average HBP occurence
0 0.371132
2 0.752674
1 0.629022
Where the output index is the diabetes types and the value is the average occurrence of diabetes.
Here is the full code
import csv
import pandas as pd
import seaborn as sns
df = pd.read_csv ('diabetes_012_health_indicators_BRFSS2015.csv')
df2=df.copy
pd.set_option('display.max_columns', None)
df
import matplotlib.pyplot as plt
grouped=df.groupby(['Diabetes_012'])['HighBP'].transform('mean').drop_duplicates()
print(grouped)
Here is the link to the dataset: https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset
CodePudding user response:
Don't use .transform
, just grab the column (or columns) on which you want to perform the mean:
In [3]: df.groupby("Diabetes_012")[["HighBP"]].mean()
Out[3]:
HighBP
Diabetes_012
0.0 0.371132
1.0 0.629022
2.0 0.752674
Example with multiple columns:
In [4]: df.groupby("Diabetes_012")[["HighBP", "BMI"]].mean()
Out[4]:
HighBP BMI
Diabetes_012
0.0 0.371132 27.742521
1.0 0.629022 30.724466
2.0 0.752674 31.944011