Morning All,
df = {
'inc_priority2':['Critical','Critical','Critical','High','High','High','Medium','Medium','Low'],
'inc_open_minutes':['No Data',10,6,
'No Data',4,2,
'No Data',5,
3],
}
df = pd.DataFrame(dict(df))
print(df)
inc_priority2 inc_open_minutes
0 Critical No Data
1 Critical 10
2 Critical 6
3 High No Data
4 High 4
5 High 2
6 Medium No Data
7 Medium 5
8 Low 3
Desired output i.e conditional average of all open incidents but only where there is data:
inc_priority2 Avg inc_open_minutes
0 Critical 8
1 High 3
2 Medium 5
3 Low 3
The following works if the No Data
values are converted to 0 but as it stands I get an error as trying to average strings:
dfTemp = df.groupby('inc_priority2').inc_open_minutes.apply(np.mean).reset_index()
display(dfTemp)
TypeError: can only concatenate str (not "int") to str
If I did replace No Data
with 0
, the question is similar i.e. conditional and how to average non-zero values only in the groupby
CodePudding user response:
Try something like this:
pd.to_numeric(df['inc_open_minutes'], errors='coerce').groupby(df['inc_priority2']).mean()
Output:
inc_priority2
Critical 8.0
High 3.0
Low 3.0
Medium 5.0
Name: inc_open_minutes, dtype: float64
Or,
mask = pd.to_numeric(df['inc_open_minutes'], errors='coerce').notna()
df[mask].groupby('inc_priority2').mean()
Output:
inc_open_minutes
inc_priority2
Critical 8.0
High 3.0
Low 3.0
Medium 5.0