Home > Software engineering >  repeating row of a column in a data frame n as the same length of another column
repeating row of a column in a data frame n as the same length of another column

Time:12-12

suppose I have these data:

year<- c(2000,2000,2000,2001,2001,2001,2002,2002,2002)
H<- c(1.5,2.5,3,4.7,5.7,6.5,3.2,2.1,1.9)
a<- c(11:19)
b<- c(21:29)

df<- data_frame(year,H,a,b)
df
# A tibble: 9 × 4
   year     H     a     b
  <dbl> <dbl> <int> <int>
1  2000   1.5    11    21
2  2000   2.5    12    22
3  2000   3      13    23
4  2001   4.7    14    24
5  2001   5.7    15    25
6  2001   6.5    16    26
7  2002   3.2    17    27
8  2002   2.1    18    28
9  2002   1.9    19    29

in R how can I repeat H for each year in such a way that for each year the group of data in a and b be repeated. my expected output is like this:

    year     H     a     b
   <dbl> <dbl> <dbl> <dbl>
 1  2000   1.5    11    21
 2  2000   1.5    12    22
 3  2000   1.5    13    23
 4  2000   2.5    11    21
 5  2000   2.5    12    22
 6  2000   2.5    13    23
 7  2000   3      11    21
 8  2000   3      12    22
 9  2000   3      13    23
10  2001   4.7    14    24
11  2001   4.7    15    25
12  2001   4.7    16    26
13  2001   5.7    14    24
14  2001   5.7    15    25
15  2001   5.7    16    26
16  2001   6.5    14    24
17  2001   6.5    15    25
18  2001   6.5    16    26
19  2002   3.2    17    27
20  2002   3.2    18    28
21  2002   3.2    19    29
22  2002   2.1    17    27
23  2002   2.1    18    28
24  2002   2.1    19    29
25  2002   1.9    17    27
26  2002   1.9    18    28
27  2002   1.9    19    29

CodePudding user response:

You can use tidyr::expand_grid() which accepts data frames. In this case, group by year, and then iterate over the groups with group_modify().

library(dplyr)
library(tidyr)

df %>%
  group_by(year) %>%
  group_modify(~ expand_grid(.x[1], .x[-1]))

# A tibble: 27 x 4
# Groups:   year [3]
    year     H     a     b
   <dbl> <dbl> <int> <int>
 1  2000   1.5    11    21
 2  2000   1.5    12    22
 3  2000   1.5    13    23
 4  2000   2.5    11    21
 5  2000   2.5    12    22
 6  2000   2.5    13    23
 7  2000   3      11    21
 8  2000   3      12    22
 9  2000   3      13    23
10  2001   4.7    14    24
# ... with 17 more rows

Or same idea without using group_modify() which is an experimental function:

library(purrr)

df %>%
  split(~ year) %>%
  map_df(~ expand_grid(.x[1:2], .x[3:4]))

CodePudding user response:

Another solution:

library(dplyr)

year<- c(2000,2000,2000,2001,2001,2001,2002,2002,2002)
H<- c(1.5,2.5,3,4.7,5.7,6.5,3.2,2.1,1.9)
a<- c(11:19)
b<- c(21:29)

df<- data.frame(year,H,a,b)

df %>% 
  group_by(year) %>% 
  summarise(H = rep(H, each = n()), across(-1, ~ rep(.x, n())), .groups = "drop")

#> # A tibble: 27 × 4
#>     year     H     a     b
#>    <dbl> <dbl> <int> <int>
#>  1  2000   1.5    11    21
#>  2  2000   1.5    12    22
#>  3  2000   1.5    13    23
#>  4  2000   2.5    11    21
#>  5  2000   2.5    12    22
#>  6  2000   2.5    13    23
#>  7  2000   3      11    21
#>  8  2000   3      12    22
#>  9  2000   3      13    23
#> 10  2001   4.7    14    24
#> # … with 17 more rows
  •  Tags:  
  • r
  • Related