In R, I try to divide n=10000
iid observations into 100
blocks and each block with size n/100=10
. Then for each block I want to choose the largest value for each block as a new dataset with size 100
. How to achieve this point in R?
For example,
#sample data
n<-10000
exp_data=rexp(n, 1)
CodePudding user response:
First you need a column that provides the grouping, in this example assume the groups are sequential (i.e. first 100 values belong to the first group, second 100 to the second group and so on):
df = data.frame(values = exp_data,
group = floor((1:length(exp_data))/100))
Now, just use tapply
to get the maximum:
with(df, tapply(X = values,
INDEX = group,
FUN = max))
CodePudding user response:
One tidyverse
way could be:
- We first transform to a tibble with
as_tibble
fromtibble
package. - Generate groups of 10 with
gl()
function. - Split our tibble of 10000 rows to a list of tibbles with 100 tibble
- Apply the
map
frompurrr
package with theslice_max
function (dplyr
package) to get the max value from each of the 100 new tibbles. - Finally use
bind_rows()
to get them all in your new tibble with 100 rows:
Note (dplyr
, tibble
, purrr
) are in tidyverse
library(tidyverse)
exp_data %>%
as_tibble() %>%
mutate(group =as.integer(gl(n(),100,n()))) %>%
group_split(group) %>%
map(., ~slice_max(., order_by = value)) %>%
bind_rows()
<dbl> <int>
1 5.81 1
2 6.42 2
3 4.46 3
4 4.07 4
5 5.35 5
6 5.85 6
7 4.03 7
8 5.13 8
9 4.71 9
10 4.71 10
# … with 90 more rows