I am working on a dataset that knows why a patient didn't meet up with a doctor's appointment. There are many conditions. However, we want to know which affects more.
The dependent variable was initially defined with "YES" and "NO" so I had to redefine as "1" and "0":
df.No_Show[df['No_Show'] == 'Yes'] = '1'
df.No_Show[df['No_Show'] == 'No'] = '0'
df['No_Show'] = pd.to_numeric(df['No_Show'])
again, redefined as:
showed = df.No_Show == 1
No_show = df.No_Show == 0
while trying to know the mean of those who went for appointment by age, using
df.groupby('Age')[showed].mean()
I got an error.
CodePudding user response:
You can try
df[showed].groupby('Age').mean()
CodePudding user response:
import numpy as np
import pandas as pd
df = pd.DataFrame(
data={
"No_Show": np.array(np.random.choice([0, 1], 100)),
"Age": np.random.randint(1, 100, 100),
}
)
df.groupby("Age")["No_Show"].mean()
Age
1 0.666667
2 1.000000
3 0.000000
4 1.000000
6 1.000000
...
92 1.000000
94 0.500000
95 1.000000
96 0.000000
98 0.500000
Name: No_Show, Length: 63, dtype: float64
df.groupby("No_Show")["Age"].mean()
No_Show
0 53.833333
1 47.346154
Name: Age, dtype: float64