Get 20 consecutives samples from a dataframe-CodePudding

I have a dataframe say data <- iris. Now I want to get 20 other dataframe samples from the data dataframe containing 10 rows in each sampled dataframe. An example of the sample I need would be df1 <- data[1:10,], df2 <- data[2:11,], and so on. Is there an easy way to do it? I tried looping it over data, but I am not able to create new names for each of the new data frames.

CodePudding user response：

If you want to do it with a loop, you can use assign() to add each of the 'sampled' dfs to your environment, and/or you can add the results of each 'sampling' to a list, then combine the list into a single large dataframe (if that's your 'final goal').

data <- iris

list_of_iris_dfs <- list()
for (i in 1:20) {
  df <- data[i:(i 10),]
  assign(paste0("df_sample_", i), df, .GlobalEnv)
  list_of_iris_dfs[[i]] <- df
}
# df_sample_1 to df_sample_20 are now in your environment
head(df_sample_1)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

# And you also have a list of dataframes (although you may not need it)
head(list_of_iris_dfs[[1]])
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

# If you want to use the list to create a 'large' dataframe with all
# of your small 'sampled' dataframes you can use bind_rows from the
# dplyr package:
library(dplyr)
df <- bind_rows(list_of_iris_dfs, .id = "sample_number")
head(df, 15)
#>    sample_number Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1              1          5.1         3.5          1.4         0.2  setosa
#> 2              1          4.9         3.0          1.4         0.2  setosa
#> 3              1          4.7         3.2          1.3         0.2  setosa
#> 4              1          4.6         3.1          1.5         0.2  setosa
#> 5              1          5.0         3.6          1.4         0.2  setosa
#> 6              1          5.4         3.9          1.7         0.4  setosa
#> 7              1          4.6         3.4          1.4         0.3  setosa
#> 8              1          5.0         3.4          1.5         0.2  setosa
#> 9              1          4.4         2.9          1.4         0.2  setosa
#> 10             1          4.9         3.1          1.5         0.1  setosa
#> 11             1          5.4         3.7          1.5         0.2  setosa
#> 12             2          4.9         3.0          1.4         0.2  setosa
#> 13             2          4.7         3.2          1.3         0.2  setosa
#> 14             2          4.6         3.1          1.5         0.2  setosa
#> 15             2          5.0         3.6          1.4         0.2  setosa

^{Created on 2022-08-22 by the reprex package (v2.0.1)}

CodePudding user response：

So do you want to go from data[1:10,] to data[21:30]?

first <- 1:21
data <- iris

A way to do this once would be:

mylist <- list()
mylist[[first[1]]] <- subset( x= data, select= colnames(data), subset = row.names(data) %in% first[1]:(first[1] 9))

Usually I try not to do loops in r but then you could do

mylist <- list()
for(i in 1:20) {
     mylist[[first[i]]] <- subset( x= data, select= colnames(data),
             subset = row.names(data) %in% first[i]:(first[i] 9))
}

There are certainly other ways to do this.