I have this data set:
var_1 = rnorm(1027,1000,1000)
var_2 = rnorm(1027,1000,1000)
var_3 = rnorm(1027,1000,1000)
sample_data = data.frame(var_1, var_2, var_3)
I want to split this data into sections of 100:
list_of_dfs <- split(
sample_data, (seq(nrow(sample_data))-1) %/% 100
)
However, since the number of rows in this data set is not cleanly divisible by 100 - I get 10 sections instead of 11 sections (i.e. 10 full sections and 1 non-full section):
summary(list_of_dfs)
Length Class Mode
0 3 data.frame list
1 3 data.frame list
2 3 data.frame list
3 3 data.frame list
4 3 data.frame list
5 3 data.frame list
6 3 data.frame list
7 3 data.frame list
8 3 data.frame list
9 3 data.frame list
10 3 data.frame list
- Is it possible to adjust the R code so that 11 sections are created instead of 10 sections?
Thank you!
CodePudding user response:
grp_size <- 100
n <- nrow(sample_data)
split(sample_data, gl(ceiling(n/grp_size), grp_size, length = n))
CodePudding user response:
Here's another option:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
var_1 = rnorm(1027,1000,1000)
var_2 = rnorm(1027,1000,1000)
var_3 = rnorm(1027,1000,1000)
sample_data = data.frame(var_1, var_2, var_3)
sample_data <- sample_data %>%
mutate(obs = 0:(n()-1),
group = floor(obs/100) 1)
list_of_dfs <- split(
sample_data,
sample_data$group
)
summary(list_of_dfs)
#> Length Class Mode
#> 1 5 data.frame list
#> 2 5 data.frame list
#> 3 5 data.frame list
#> 4 5 data.frame list
#> 5 5 data.frame list
#> 6 5 data.frame list
#> 7 5 data.frame list
#> 8 5 data.frame list
#> 9 5 data.frame list
#> 10 5 data.frame list
#> 11 5 data.frame list
Created on 2022-03-10 by the reprex package (v2.0.1)