I have a pandas DataFrame with no datetime index. An example of my DataFrame is the following
x1 x2 x3 y1 y2
0 0.83 0.83 1.16 0.0 1.0
1 0.83 0.83 1.25 0.0 1.0
2 0.64 0.85 1.09 0.0 1.0
3 0.72 0.94 1.06 0.0 1.0
4 0.73 0.85 4.04 0.0 1.0
5 0.83 0.94 4.84 0.0 1.0
6 0.56 0.43 1.07 0.0 1.0
7 0.59 0.56 1.05 0.0 1.0
8 0.59 0.91 2.05 0.0 1.0
9 0.59 0.96 4.99 0.0 1.0
10 0.83 0.56 7.99 0.0 1.0
I would like to resample this DataFrame by values of x1,x2,x3
that lie in specific intervals and sum over y1
and y2
. Let's say that these intervals are [0,1]
for x1,x2
and [1,8]
for x3
. I divide these intervals with bins of width 0.01
. Then, I want to sum y1
and y2
for each value of x1,x2,x3
in a given bin of these intervals.
How can I do this?
CodePudding user response:
We could round each of the x
column values to their closest 0.05 decimals and use groupby
sum
:
cols = ['x1','x2','x3']
out = df[cols].apply(lambda col: col.mul(10).astype(int).div(10).add(0.05)).combine_first(df).groupby(['x1','x2','x3'], as_index=False).sum()
Output:
x1 x2 x3 y1 y2
0 0.55 0.45 1.05 0.0 1.0
1 0.55 0.55 1.05 0.0 1.0
2 0.55 0.95 2.05 0.0 1.0
3 0.55 0.95 4.95 0.0 1.0
4 0.65 0.85 1.05 0.0 1.0
5 0.75 0.85 4.05 0.0 1.0
6 0.75 0.95 1.05 0.0 1.0
7 0.85 0.55 7.95 0.0 1.0
8 0.85 0.85 1.15 0.0 1.0
9 0.85 0.85 1.25 0.0 1.0
10 0.85 0.95 4.85 0.0 1.0