Home > Blockchain >  Select every other nth row of data frame and add to a list of data frames in R
Select every other nth row of data frame and add to a list of data frames in R

Time:03-10

I currently have the below data frame and am trying to develop a list of 5 unique data frames containing every 5th row of the original df. Is there a way to select every other 5th row and add it to a new data frame in a list? Either using a for loop or lapply?

df
X1 X2 X3     X4 X5
1  0 0 1.501990  0
2  0 0 1.883904  0
3  0 0 1.333195  0
4  0 0 0.000000  0
5  0 0 2.136760  0
6  0 0 2.186790  0
7  0 0 1.269592  0
8  0 0 1.458405  0
9  0 0 1.816493  0
10 0 0 0.000000  0
11 0 0 2.190029  0
12 0 0 0.000000  0
13 0 0 1.460534  0
14 0 0 1.470776  0
15 0 0 1.675406  0
16 0 0 1.842470  0
17 0 0 1.937999  0
18 0 0 0.000000  0
19 0 0 1.649926  0
20 0 0 2.067902  0

For example, the first data frame would consist of the 1st, 6th, 11th, and 16th row, while the next would start with the 2nd row and carry on down the rows of the df?

CodePudding user response:

Use split with 1:5 to create dataframes with a 5-row interval.

split(df, 1:5)

output

$`1`
   X1 X2 X3       X4 X5
1   1  0  0 1.501990  0
6   6  0  0 2.186790  0
11 11  0  0 2.190029  0
16 16  0  0 1.842470  0

$`2`
   X1 X2 X3       X4 X5
2   2  0  0 1.883904  0
7   7  0  0 1.269592  0
12 12  0  0 0.000000  0
17 17  0  0 1.937999  0

$`3`
   X1 X2 X3       X4 X5
3   3  0  0 1.333195  0
8   8  0  0 1.458405  0
13 13  0  0 1.460534  0
18 18  0  0 0.000000  0

$`4`
   X1 X2 X3       X4 X5
4   4  0  0 0.000000  0
9   9  0  0 1.816493  0
14 14  0  0 1.470776  0
19 19  0  0 1.649926  0

$`5`
   X1 X2 X3       X4 X5
5   5  0  0 2.136760  0
10 10  0  0 0.000000  0
15 15  0  0 1.675406  0
20 20  0  0 2.067902  0

An alternative with dplyr::group_split is:

group_split(df, rep(1:5, nrow(df)/5), .keep = F)

data

df <- structure(list(X1 = 1:20, X2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), X3 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), X4 = c(1.50199, 1.883904, 1.333195, 0, 2.13676, 
2.18679, 1.269592, 1.458405, 1.816493, 0, 2.190029, 0, 1.460534, 
1.470776, 1.675406, 1.84247, 1.937999, 0, 1.649926, 2.067902), 
    X5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-20L))

CodePudding user response:

A possible solution, based on dplyr::group_split:

library(dplyr)

df %>% 
  mutate(id = 0:(nrow(df)-1) %% 4) %>% 
  group_by(id) %>% 
  group_split(.keep = F) %>% 
  as.list

#> [[1]]
#> # A tibble: 5 × 5
#>      X1    X2    X3    X4    X5
#>   <int> <int> <int> <dbl> <int>
#> 1     1     0     0  1.50     0
#> 2     5     0     0  2.14     0
#> 3     9     0     0  1.82     0
#> 4    13     0     0  1.46     0
#> 5    17     0     0  1.94     0
#> 
#> [[2]]
#> # A tibble: 5 × 5
#>      X1    X2    X3    X4    X5
#>   <int> <int> <int> <dbl> <int>
#> 1     2     0     0  1.88     0
#> 2     6     0     0  2.19     0
#> 3    10     0     0  0        0
#> 4    14     0     0  1.47     0
#> 5    18     0     0  0        0
#> 
#> [[3]]
#> # A tibble: 5 × 5
#>      X1    X2    X3    X4    X5
#>   <int> <int> <int> <dbl> <int>
#> 1     3     0     0  1.33     0
#> 2     7     0     0  1.27     0
#> 3    11     0     0  2.19     0
#> 4    15     0     0  1.68     0
#> 5    19     0     0  1.65     0
#> 
#> [[4]]
#> # A tibble: 5 × 5
#>      X1    X2    X3    X4    X5
#>   <int> <int> <int> <dbl> <int>
#> 1     4     0     0  0        0
#> 2     8     0     0  1.46     0
#> 3    12     0     0  0        0
#> 4    16     0     0  1.84     0
#> 5    20     0     0  2.07     0
  • Related