Home > Back-end >  Pandas GroupBy Two columns, where the second is the average containing both numeric and string data
Pandas GroupBy Two columns, where the second is the average containing both numeric and string data

Time:07-17

Morning All,

df = {
'inc_priority2':['Critical','Critical','Critical','High','High','High','Medium','Medium','Low'],
'inc_open_minutes':['No Data',10,6,
                    'No Data',4,2,
                    'No Data',5,
                    3],
}
df = pd.DataFrame(dict(df))
print(df)

 inc_priority2 inc_open_minutes
0      Critical          No Data
1      Critical               10
2      Critical                6
3          High          No Data
4          High                4
5          High                2
6        Medium          No Data
7        Medium                5
8           Low                3

Desired output i.e conditional average of all open incidents but only where there is data:

  inc_priority2  Avg inc_open_minutes
0      Critical                 8
1          High                 3
2        Medium                 5
3           Low                 3

The following works if the No Data values are converted to 0 but as it stands I get an error as trying to average strings:

dfTemp = df.groupby('inc_priority2').inc_open_minutes.apply(np.mean).reset_index()
display(dfTemp)
TypeError: can only concatenate str (not "int") to str

If I did replace No Data with 0, the question is similar i.e. conditional and how to average non-zero values only in the groupby

CodePudding user response:

Try something like this:

pd.to_numeric(df['inc_open_minutes'], errors='coerce').groupby(df['inc_priority2']).mean()

Output:

inc_priority2
Critical    8.0
High        3.0
Low         3.0
Medium      5.0
Name: inc_open_minutes, dtype: float64

Or,

mask = pd.to_numeric(df['inc_open_minutes'], errors='coerce').notna()
df[mask].groupby('inc_priority2').mean()

Output:

               inc_open_minutes
inc_priority2                  
Critical                    8.0
High                        3.0
Low                         3.0
Medium                      5.0
  • Related