Home > database >  Split dataframe into multiple by every nth column
Split dataframe into multiple by every nth column

Time:06-30

I would prefer tidyverse solution! The question is related to this post.

Example data

structure(list(A = c(79L, 42L, 74L, 49L, 82L, 22L, 88L, 13L, 
54L, 68L), B = c(41L, 22L, 1L, 40L, 96L, 48L, 40L, 56L, 19L, 
84L), C = c(20L, 10L, 1L, 27L, 34L, 27L, 35L, 3L, 78L, 36L), 
    D = c(40L, 92L, 76L, 81L, 73L, 30L, 10L, 57L, 19L, 18L), 
    G = c(50L, 74L, 37L, 60L, 23L, 42L, 22L, 94L, 28L, 68L), 
    H = c(54L, 62L, 92L, 61L, 91L, 76L, 51L, 60L, 89L, 36L), 
    J = c(64L, 59L, 1L, 99L, 36L, 26L, 15L, 16L, 83L, 39L), K = c(29L, 
    30L, 80L, 33L, 44L, 28L, 9L, 53L, 11L, 68L), L = c(42L, 29L, 
    10L, 75L, 24L, 68L, 56L, 77L, 23L, 92L), M = c(56L, 27L, 
    61L, 40L, 76L, 50L, 31L, 15L, 72L, 40L), N = c(45L, 33L, 
    37L, 32L, 5L, 20L, 45L, 38L, 25L, 32L), Z = c(52L, 88L, 74L, 
    91L, 86L, 43L, 4L, 6L, 61L, 69L), X = c(58L, 92L, 19L, 99L, 
    9L, 58L, 53L, 49L, 48L, 32L), Y = c(75L, 13L, 63L, 37L, 30L, 
    98L, 98L, 94L, 38L, 25L), S = c(99L, 64L, 27L, 30L, 100L, 
    40L, 76L, 2L, 10L, 57L), P = c(16L, 76L, 69L, 64L, 68L, 34L, 
    96L, 22L, 48L, 1L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))

# A tibble: 10 x 16
       A     B     C     D     G     H     J     K     L     M     N
   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1    79    41    20    40    50    54    64    29    42    56    45
 2    42    22    10    92    74    62    59    30    29    27    33
 3    74     1     1    76    37    92     1    80    10    61    37
 4    49    40    27    81    60    61    99    33    75    40    32
 5    82    96    34    73    23    91    36    44    24    76     5
 6    22    48    27    30    42    76    26    28    68    50    20
 7    88    40    35    10    22    51    15     9    56    31    45
 8    13    56     3    57    94    60    16    53    77    15    38
 9    54    19    78    19    28    89    83    11    23    72    25
10    68    84    36    18    68    36    39    68    92    40    32
# ... with 5 more variables: Z <int>, X <int>, Y <int>, S <int>,
#   P <int>

Split up the dataframe in to multiple every 4 columns, into a list, such that one can loop/map export them individually. Desired output:

[[1]]
# A tibble: 10 x 4
       A     B     C     D
   <int> <int> <int> <int>
 1    79    41    20    40
 2    42    22    10    92
 3    74     1     1    76
 4    49    40    27    81
 5    82    96    34    73
 6    22    48    27    30
 7    88    40    35    10
 8    13    56     3    57
 9    54    19    78    19
10    68    84    36    18

[[2]]
# A tibble: 10 x 4
       G     H     J     K
   <int> <int> <int> <int>
 1    50    54    64    29
 2    74    62    59    30
 3    37    92     1    80
 4    60    61    99    33
 5    23    91    36    44
 6    42    76    26    28
 7    22    51    15     9
 8    94    60    16    53
 9    28    89    83    11
10    68    36    39    68

And so on...

The intention is to export each of them into CSV individually.

CodePudding user response:

This is a straightforward one-liner in base R:

lapply(seq(ncol(df)/4) - 1, function(x) df[4 * x   1:4])
#> [[1]]
#> # A tibble: 10 x 4
#>        A     B     C     D
#>    <int> <int> <int> <int>
#>  1    79    41    20    40
#>  2    42    22    10    92
#>  3    74     1     1    76
#>  4    49    40    27    81
#>  5    82    96    34    73
#>  6    22    48    27    30
#>  7    88    40    35    10
#>  8    13    56     3    57
#>  9    54    19    78    19
#> 10    68    84    36    18
#> 
#> [[2]]
#> # A tibble: 10 x 4
#>        G     H     J     K
#>    <int> <int> <int> <int>
#>  1    50    54    64    29
#>  2    74    62    59    30
#>  3    37    92     1    80
#>  4    60    61    99    33
#>  5    23    91    36    44
#>  6    42    76    26    28
#>  7    22    51    15     9
#>  8    94    60    16    53
#>  9    28    89    83    11
#> 10    68    36    39    68
#> 
#> [[3]]
#> # A tibble: 10 x 4
#>        L     M     N     Z
#>    <int> <int> <int> <int>
#>  1    42    56    45    52
#>  2    29    27    33    88
#>  3    10    61    37    74
#>  4    75    40    32    91
#>  5    24    76     5    86
#>  6    68    50    20    43
#>  7    56    31    45     4
#>  8    77    15    38     6
#>  9    23    72    25    61
#> 10    92    40    32    69
#> 
#> [[4]]
#> # A tibble: 10 x 4
#>        X     Y     S     P
#>    <int> <int> <int> <int>
#>  1    58    75    99    16
#>  2    92    13    64    76
#>  3    19    63    27    69
#>  4    99    37    30    64
#>  5     9    30   100    68
#>  6    58    98    40    34
#>  7    53    98    76    96
#>  8    49    94     2    22
#>  9    48    38    10    48
#> 10    32    25    57     1

Though if for some reason you need a tidyverse solution, the equivalent would be:

purrr::map(seq(ncol(df)/4) - 1, ~ df[4 * .x   1:4])

Created on 2022-06-29 by the reprex package (v2.0.1)

CodePudding user response:

Here is a base R option using split.default

> split.default(df, ceiling(seq_along(df) / 4))
$`1`
# A tibble: 10 × 4
       A     B     C     D
   <int> <int> <int> <int>
 1    79    41    20    40
 2    42    22    10    92
 3    74     1     1    76
 4    49    40    27    81
 5    82    96    34    73
 6    22    48    27    30
 7    88    40    35    10
 8    13    56     3    57
 9    54    19    78    19
10    68    84    36    18

$`2`
# A tibble: 10 × 4
       G     H     J     K
   <int> <int> <int> <int>
 1    50    54    64    29
 2    74    62    59    30
 3    37    92     1    80
 4    60    61    99    33
 5    23    91    36    44
 6    42    76    26    28
 7    22    51    15     9
 8    94    60    16    53
 9    28    89    83    11
10    68    36    39    68

$`3`
# A tibble: 10 × 4
       L     M     N     Z
   <int> <int> <int> <int>
 1    42    56    45    52
 2    29    27    33    88
 3    10    61    37    74
 4    75    40    32    91
 5    24    76     5    86
 6    68    50    20    43
 7    56    31    45     4
 8    77    15    38     6
 9    23    72    25    61
10    92    40    32    69

$`4`
# A tibble: 10 × 4
       X     Y     S     P
   <int> <int> <int> <int>
 1    58    75    99    16
 2    92    13    64    76
 3    19    63    27    69
 4    99    37    30    64
 5     9    30   100    68
 6    58    98    40    34
 7    53    98    76    96
 8    49    94     2    22
 9    48    38    10    48
10    32    25    57     1
  • Related