I want to be able to use a "groupby" on my pandas dataframe using different custom functions for each columns. For example, if I have this as input:
annotator event interval_presence duration
3 birds [0,5] 5
3 birds [7,9] 10
3 voices [1,2] 10
3 traffic [1,7] 7
5 voices [4,7] 4
5 voices [5,10] 6
5 traffic [0,1] 4
Where each item in "interval_presence" is a pandas interval. When merging, I want to take the mean of column "duration" and I want to use "pd.arrays.IntervalArray" and "piso.union" on my intervals in "interval_presence". So this would be the output:
annotator event interval_presence duration
3 birds [[0,5],[7,9]] 7.5
3 voices [1,2] 10
3 traffic [1,7] 7
5 voices [4,10] 5
5 traffic [0,1] 4
Right now, I know how to merge my intervals thanks to the answer in the post: Pandas: how to merge rows by union of intervals. So the solution would be:
data = data.groupby(['annotator', 'event'])['interval_presence'] \
.apply(pd.arrays.IntervalArray) \
.apply(piso.union) \
.reset_index()
But how can I simultaneously apply a "mean" function to "duration" ?
CodePudding user response:
You used the wrong agg
syntax. Try this:
df.groupby(["annotator", "event"]).agg({
"interval_presence": lambda s: piso.union(pd.arrays.IntervalArray(s)),
"duration": "mean"
})
Within the lambda, s
is a series of pd.Interval
objects.