Home > other >  R mutate last row of group
R mutate last row of group

Time:08-12

I have a dataframe where the first column represents the date and it covers the span of a few months. I then duplicate the last row of each month using How to duplicate last row by group (ID)?

What I'm hoping to achieve now is to modify a certain column of that duplicated row. I'll explain it better with a snippet of my data (attached at the bottom):

# A tibble: 11 × 3
   date        file value
   <date>     <int> <dbl>
 1 2021-07-26     1   858
 2 2021-07-27     1   880
 3 2021-07-28     1   904
 4 2021-07-29     1   933
 5 2021-07-30     1   970
 6 2021-07-31     1  1010
 7 2021-07-31     1  1010
 8 2021-08-01     2  1048
 9 2021-08-02     2  1078
10 2021-08-03     2  1096
11 2021-08-04     2  1107

As you can see, rows 6 and 7 are the same. Row 7 is the duplicate of 6 and I would like to change the "file" column from "1" to "2" so I can then get that row as the first row of the group "file==2".

Expected output (for clarification):

# A tibble: 11 × 3
   date        file value
   <date>     <int> <dbl>
 1 2021-07-26     1   858
 2 2021-07-27     1   880
 3 2021-07-28     1   904
 4 2021-07-29     1   933
 5 2021-07-30     1   970
 6 2021-07-31     1  1010
 7 2021-07-31     2  1010  # file changed from 1 to 2 as desired
 8 2021-08-01     2  1048
 9 2021-08-02     2  1078
10 2021-08-03     2  1096
11 2021-08-04     2  1107

Data:

df <- structure(list(date = structure(c(18834, 18835, 18836, 18837, 
18838, 18839, 18839, 18840, 18841, 18842, 18843), class = "Date"), 
    file = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), value = c(858, 
    880, 904, 933, 970, 1010, 1010, 1048, 1078, 1096, 1107)), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

One possible way to solve your probelm:

df %>%
  mutate(file = file   duplicated(.))

# A tibble: 11 x 3
   date        file value
   <date>     <int> <dbl>
 1 2021-07-26     1   858
 2 2021-07-27     1   880
 3 2021-07-28     1   904
 4 2021-07-29     1   933
 5 2021-07-30     1   970
 6 2021-07-31     1  1010
 7 2021-07-31     2  1010
 8 2021-08-01     2  1048
 9 2021-08-02     2  1078
10 2021-08-03     2  1096
11 2021-08-04     2  1107

CodePudding user response:

A possible solution:

library(dplyr)

df %>% 
  mutate(file = file   data.table::rowid(.$date) - 1)

#> # A tibble: 11 × 3
#>    date        file value
#>    <date>     <dbl> <dbl>
#>  1 2021-07-26     1   858
#>  2 2021-07-27     1   880
#>  3 2021-07-28     1   904
#>  4 2021-07-29     1   933
#>  5 2021-07-30     1   970
#>  6 2021-07-31     1  1010
#>  7 2021-07-31     2  1010
#>  8 2021-08-01     2  1048
#>  9 2021-08-02     2  1078
#> 10 2021-08-03     2  1096
#> 11 2021-08-04     2  1107

Another possible solution:

library(dplyr)

df %>% 
  group_by(date) %>% 
  mutate(file = first(file):(first(file) n()-1)) %>% 
  ungroup

#> # A tibble: 11 × 3
#>    date        file value
#>    <date>     <int> <dbl>
#>  1 2021-07-26     1   858
#>  2 2021-07-27     1   880
#>  3 2021-07-28     1   904
#>  4 2021-07-29     1   933
#>  5 2021-07-30     1   970
#>  6 2021-07-31     1  1010
#>  7 2021-07-31     2  1010
#>  8 2021-08-01     2  1048
#>  9 2021-08-02     2  1078
#> 10 2021-08-03     2  1096
#> 11 2021-08-04     2  1107
  • Related