The answer for this question "R Create a column containing value of another column based on min or max of reference column" does what I want but it's not working when the reference column has more than a ones digit or is a date.
library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
Value = c(34, 54, 17, 2, 25, 89, 23, 93, 90),
Int1 = rep(1:3, 3),
Int2 = rep(11:13, 3),
Dbl1 = rep(c(1.1, 2.2, 3.3), 3),
Dbl2 = rep(c(11.1, 12.2, 13.3), 3),
Date = as.Date(c("2001-01-01", "2002-02-02", "2003-03-03", "2001-01-01", "2002-02-02", "2003-03-03", "2001-01-01", "2002-02-02", "2003-03-03")))
df
ID Value Int1 Int2 Dbl1 Dbl2 Date
1 1 34 1 11 1.1 11.1 2001-01-01
2 1 54 2 12 2.2 12.2 2002-02-02
3 1 17 3 13 3.3 13.3 2003-03-03
4 2 2 1 11 1.1 11.1 2001-01-01
5 2 25 2 12 2.2 12.2 2002-02-02
6 2 89 3 13 3.3 13.3 2003-03-03
7 3 23 1 11 1.1 11.1 2001-01-01
8 3 93 2 12 2.2 12.2 2002-02-02
9 3 90 3 13 3.3 13.3 2003-03-03
df %>%
group_by(ID) %>%
mutate(.Int1 = Value[min(Int1)],
.Int2 = Value[min(Int2)],
.Dbl1 = Value[min(Dbl1)],
.Dbl2 = Value[min(Dbl2)],
.Date = Value[min(Date)]) %>%
ungroup
# A tibble: 9 × 12
ID Value Int1 Int2 Dbl1 Dbl2 Date .Int1 .Int2 .Dbl1 .Dbl2 .Date
<dbl> <dbl> <int> <int> <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 34 1 11 1.1 11.1 2001-01-01 34 NA 34 NA NA
2 1 54 2 12 2.2 12.2 2002-02-02 34 NA 34 NA NA
3 1 17 3 13 3.3 13.3 2003-03-03 34 NA 34 NA NA
4 2 2 1 11 1.1 11.1 2001-01-01 2 NA 2 NA NA
5 2 25 2 12 2.2 12.2 2002-02-02 2 NA 2 NA NA
6 2 89 3 13 3.3 13.3 2003-03-03 2 NA 2 NA NA
7 3 23 1 11 1.1 11.1 2001-01-01 23 NA 23 NA NA
8 3 93 2 12 2.2 12.2 2002-02-02 23 NA 23 NA NA
9 3 90 3 13 3.3 13.3 2003-03-03 23 NA 23 NA NA
What do I need to do so the results in .Int2, .Dbl2, and .Date are not NA and the same as .Int1 and .Dbl1?
CodePudding user response:
It is not the min
imum we need for indexing, it would be the position index with which.min
, otherwise, min(Int2)
for ID 1 will be 11
, but there are only 3 rows for ID 1.
df %>%
group_by(ID) %>%
mutate(.Int1 = Value[which.min(Int1)],
.Int2 = Value[which.min(Int2)],
.Dbl1 = Value[which.min(Dbl1)],
.Dbl2 = Value[which.min(Dbl2)],
.Date = Value[which.min(Date)]) %>%
ungroup
-output
# A tibble: 9 × 12
ID Value Int1 Int2 Dbl1 Dbl2 Date .Int1 .Int2 .Dbl1 .Dbl2 .Date
<dbl> <dbl> <int> <int> <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 34 1 11 1.1 11.1 2001-01-01 34 34 34 34 34
2 1 54 2 12 2.2 12.2 2002-02-02 34 34 34 34 34
3 1 17 3 13 3.3 13.3 2003-03-03 34 34 34 34 34
4 2 2 1 11 1.1 11.1 2001-01-01 2 2 2 2 2
5 2 25 2 12 2.2 12.2 2002-02-02 2 2 2 2 2
6 2 89 3 13 3.3 13.3 2003-03-03 2 2 2 2 2
7 3 23 1 11 1.1 11.1 2001-01-01 23 23 23 23 23
8 3 93 2 12 2.2 12.2 2002-02-02 23 23 23 23 23
9 3 90 3 13 3.3 13.3 2003-03-03 23 23 23 23 23
The repetition of code can be simplified if we use across
df %>%
group_by(ID) %>%
mutate(across(-Value, ~ Value[which.min(.x)],
.names = ".{.col}")) %>%
ungroup
-output
# A tibble: 9 × 12
ID Value Int1 Int2 Dbl1 Dbl2 Date .Int1 .Int2 .Dbl1 .Dbl2 .Date
<dbl> <dbl> <int> <int> <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 34 1 11 1.1 11.1 2001-01-01 34 34 34 34 34
2 1 54 2 12 2.2 12.2 2002-02-02 34 34 34 34 34
3 1 17 3 13 3.3 13.3 2003-03-03 34 34 34 34 34
4 2 2 1 11 1.1 11.1 2001-01-01 2 2 2 2 2
5 2 25 2 12 2.2 12.2 2002-02-02 2 2 2 2 2
6 2 89 3 13 3.3 13.3 2003-03-03 2 2 2 2 2
7 3 23 1 11 1.1 11.1 2001-01-01 23 23 23 23 23
8 3 93 2 12 2.2 12.2 2002-02-02 23 23 23 23 23
9 3 90 3 13 3.3 13.3 2003-03-03 23 23 23 23 23