Home > Enterprise >  Resampling pandas dataframe without datetime index in higher-dimensional bins
Resampling pandas dataframe without datetime index in higher-dimensional bins

Time:02-22

I have a pandas DataFrame with no datetime index. An example of my DataFrame is the following

    x1      x2      x3      y1      y2
0   0.83    0.83    1.16    0.0     1.0
1   0.83    0.83    1.25    0.0     1.0
2   0.64    0.85    1.09    0.0     1.0
3   0.72    0.94    1.06    0.0     1.0
4   0.73    0.85    4.04    0.0     1.0
5   0.83    0.94    4.84    0.0     1.0
6   0.56    0.43    1.07    0.0     1.0
7   0.59    0.56    1.05    0.0     1.0
8   0.59    0.91    2.05    0.0     1.0
9   0.59    0.96    4.99    0.0     1.0
10  0.83    0.56    7.99    0.0     1.0

I would like to resample this DataFrame by values of x1,x2,x3 that lie in specific intervals and sum over y1 and y2. Let's say that these intervals are [0,1] for x1,x2 and [1,8] for x3. I divide these intervals with bins of width 0.01. Then, I want to sum y1 and y2 for each value of x1,x2,x3 in a given bin of these intervals.

How can I do this?

CodePudding user response:

We could round each of the x column values to their closest 0.05 decimals and use groupby sum:

cols = ['x1','x2','x3']
out = df[cols].apply(lambda col: col.mul(10).astype(int).div(10).add(0.05)).combine_first(df).groupby(['x1','x2','x3'], as_index=False).sum()

Output:

      x1    x2    x3   y1   y2
0   0.55  0.45  1.05  0.0  1.0
1   0.55  0.55  1.05  0.0  1.0
2   0.55  0.95  2.05  0.0  1.0
3   0.55  0.95  4.95  0.0  1.0
4   0.65  0.85  1.05  0.0  1.0
5   0.75  0.85  4.05  0.0  1.0
6   0.75  0.95  1.05  0.0  1.0
7   0.85  0.55  7.95  0.0  1.0
8   0.85  0.85  1.15  0.0  1.0
9   0.85  0.85  1.25  0.0  1.0
10  0.85  0.95  4.85  0.0  1.0
  • Related