Creating a non overlapping bins in R-CodePudding

I have a set of x,y data (10,000). These data points are to be partitioned along the x-axis into non-overlapping bins of 10 data points each. From this, I need a new dataset, such that x = mean of these 10 data, y = maximum of these 10 data. The final data set should be 1000 sets of x,y. sample

Sample in Excel. I want to perform this task in R

CodePudding user response：

In tidyverse:

library(tidyverse)
df %>%
  arrange(x) %>%
  group_by(grp = gl(n(), 10, n())) %>%
  summarise(x = mean(x), y = max(y))

In Base R

n <- nrow(df)
do.call(rbind.data.frame, by(df[order(df$x),], gl(n, 10, n),
     function(x) cbind(x = mean(x$x), y = max(x$y))))

CodePudding user response：

I created some sample data as you did not provide those. I use the library data.table but you could do similar in dplyr or base.

library(data.table)

dt <- data.table(
  x = sample(40:50, 50, replace = T),
  y = sample(1000:3000, 50)
)

dt[, grp := gl(.N, 10, .N)] # edit based on Onyambu's solution
dt[, .(x_avg = mean(x), y_max = max(y)), by = grp]

#    grp x_avg y_max
# 1:   1  44.7  2765
# 2:   2  45.3  2861
# 3:   3  44.7  2831
# 4:   4  46.2  2947
# 5:   5  46.7  2684