I have a dataframe where the first column represents the date and it covers the span of a few months. I then duplicate the last row of each month using How to duplicate last row by group (ID)?
What I'm hoping to achieve now is to modify a certain column of that duplicated row. I'll explain it better with a snippet of my data (attached at the bottom):
# A tibble: 11 × 3
date file value
<date> <int> <dbl>
1 2021-07-26 1 858
2 2021-07-27 1 880
3 2021-07-28 1 904
4 2021-07-29 1 933
5 2021-07-30 1 970
6 2021-07-31 1 1010
7 2021-07-31 1 1010
8 2021-08-01 2 1048
9 2021-08-02 2 1078
10 2021-08-03 2 1096
11 2021-08-04 2 1107
As you can see, rows 6 and 7 are the same. Row 7 is the duplicate of 6 and I would like to change the "file" column from "1" to "2" so I can then get that row as the first row of the group "file==2".
Expected output (for clarification):
# A tibble: 11 × 3
date file value
<date> <int> <dbl>
1 2021-07-26 1 858
2 2021-07-27 1 880
3 2021-07-28 1 904
4 2021-07-29 1 933
5 2021-07-30 1 970
6 2021-07-31 1 1010
7 2021-07-31 2 1010 # file changed from 1 to 2 as desired
8 2021-08-01 2 1048
9 2021-08-02 2 1078
10 2021-08-03 2 1096
11 2021-08-04 2 1107
Data:
df <- structure(list(date = structure(c(18834, 18835, 18836, 18837,
18838, 18839, 18839, 18840, 18841, 18842, 18843), class = "Date"),
file = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), value = c(858,
880, 904, 933, 970, 1010, 1010, 1048, 1078, 1096, 1107)), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
One possible way to solve your probelm:
df %>%
mutate(file = file duplicated(.))
# A tibble: 11 x 3
date file value
<date> <int> <dbl>
1 2021-07-26 1 858
2 2021-07-27 1 880
3 2021-07-28 1 904
4 2021-07-29 1 933
5 2021-07-30 1 970
6 2021-07-31 1 1010
7 2021-07-31 2 1010
8 2021-08-01 2 1048
9 2021-08-02 2 1078
10 2021-08-03 2 1096
11 2021-08-04 2 1107
CodePudding user response:
A possible solution:
library(dplyr)
df %>%
mutate(file = file data.table::rowid(.$date) - 1)
#> # A tibble: 11 × 3
#> date file value
#> <date> <dbl> <dbl>
#> 1 2021-07-26 1 858
#> 2 2021-07-27 1 880
#> 3 2021-07-28 1 904
#> 4 2021-07-29 1 933
#> 5 2021-07-30 1 970
#> 6 2021-07-31 1 1010
#> 7 2021-07-31 2 1010
#> 8 2021-08-01 2 1048
#> 9 2021-08-02 2 1078
#> 10 2021-08-03 2 1096
#> 11 2021-08-04 2 1107
Another possible solution:
library(dplyr)
df %>%
group_by(date) %>%
mutate(file = first(file):(first(file) n()-1)) %>%
ungroup
#> # A tibble: 11 × 3
#> date file value
#> <date> <int> <dbl>
#> 1 2021-07-26 1 858
#> 2 2021-07-27 1 880
#> 3 2021-07-28 1 904
#> 4 2021-07-29 1 933
#> 5 2021-07-30 1 970
#> 6 2021-07-31 1 1010
#> 7 2021-07-31 2 1010
#> 8 2021-08-01 2 1048
#> 9 2021-08-02 2 1078
#> 10 2021-08-03 2 1096
#> 11 2021-08-04 2 1107