Home > other >  R New column with repeating value from another column based on min or max of reference column
R New column with repeating value from another column based on min or max of reference column

Time:12-20

The answer for this question "R Create a column containing value of another column based on min or max of reference column" does what I want but it's not working when the reference column has more than a ones digit or is a date.

library(dplyr)
df <- data.frame(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
                 Value = c(34, 54, 17, 2, 25, 89, 23, 93, 90),
                 Int1 = rep(1:3, 3),
                 Int2 = rep(11:13, 3),
                 Dbl1  = rep(c(1.1, 2.2, 3.3), 3),
                 Dbl2 = rep(c(11.1, 12.2, 13.3), 3),
                 Date = as.Date(c("2001-01-01", "2002-02-02", "2003-03-03", "2001-01-01", "2002-02-02", "2003-03-03", "2001-01-01", "2002-02-02", "2003-03-03")))

df
  ID Value Int1 Int2 Dbl1 Dbl2       Date
1  1    34    1   11  1.1 11.1 2001-01-01
2  1    54    2   12  2.2 12.2 2002-02-02
3  1    17    3   13  3.3 13.3 2003-03-03
4  2     2    1   11  1.1 11.1 2001-01-01
5  2    25    2   12  2.2 12.2 2002-02-02
6  2    89    3   13  3.3 13.3 2003-03-03
7  3    23    1   11  1.1 11.1 2001-01-01
8  3    93    2   12  2.2 12.2 2002-02-02
9  3    90    3   13  3.3 13.3 2003-03-03


df %>%
  group_by(ID) %>%
  mutate(.Int1 = Value[min(Int1)],
         .Int2 = Value[min(Int2)],
         .Dbl1 = Value[min(Dbl1)],
         .Dbl2 = Value[min(Dbl2)],
         .Date = Value[min(Date)]) %>% 
  ungroup

    # A tibble: 9 × 12
     ID Value  Int1  Int2  Dbl1  Dbl2 Date       .Int1 .Int2 .Dbl1 .Dbl2 .Date
  <dbl> <dbl> <int> <int> <dbl> <dbl> <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
1     1    34     1    11   1.1  11.1 2001-01-01    34    NA    34    NA    NA
2     1    54     2    12   2.2  12.2 2002-02-02    34    NA    34    NA    NA
3     1    17     3    13   3.3  13.3 2003-03-03    34    NA    34    NA    NA
4     2     2     1    11   1.1  11.1 2001-01-01     2    NA     2    NA    NA
5     2    25     2    12   2.2  12.2 2002-02-02     2    NA     2    NA    NA
6     2    89     3    13   3.3  13.3 2003-03-03     2    NA     2    NA    NA
7     3    23     1    11   1.1  11.1 2001-01-01    23    NA    23    NA    NA
8     3    93     2    12   2.2  12.2 2002-02-02    23    NA    23    NA    NA
9     3    90     3    13   3.3  13.3 2003-03-03    23    NA    23    NA    NA

What do I need to do so the results in .Int2, .Dbl2, and .Date are not NA and the same as .Int1 and .Dbl1?

CodePudding user response:

It is not the minimum we need for indexing, it would be the position index with which.min, otherwise, min(Int2) for ID 1 will be 11, but there are only 3 rows for ID 1.

df %>%
  group_by(ID) %>%
  mutate(.Int1 = Value[which.min(Int1)],
         .Int2 = Value[which.min(Int2)],
         .Dbl1 = Value[which.min(Dbl1)],
         .Dbl2 = Value[which.min(Dbl2)],
         .Date = Value[which.min(Date)]) %>% 
  ungroup

-output

# A tibble: 9 × 12
     ID Value  Int1  Int2  Dbl1  Dbl2 Date       .Int1 .Int2 .Dbl1 .Dbl2 .Date
  <dbl> <dbl> <int> <int> <dbl> <dbl> <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
1     1    34     1    11   1.1  11.1 2001-01-01    34    34    34    34    34
2     1    54     2    12   2.2  12.2 2002-02-02    34    34    34    34    34
3     1    17     3    13   3.3  13.3 2003-03-03    34    34    34    34    34
4     2     2     1    11   1.1  11.1 2001-01-01     2     2     2     2     2
5     2    25     2    12   2.2  12.2 2002-02-02     2     2     2     2     2
6     2    89     3    13   3.3  13.3 2003-03-03     2     2     2     2     2
7     3    23     1    11   1.1  11.1 2001-01-01    23    23    23    23    23
8     3    93     2    12   2.2  12.2 2002-02-02    23    23    23    23    23
9     3    90     3    13   3.3  13.3 2003-03-03    23    23    23    23    23

The repetition of code can be simplified if we use across

df %>% 
  group_by(ID) %>% 
  mutate(across(-Value, ~   Value[which.min(.x)], 
      .names = ".{.col}")) %>%
  ungroup

-output

# A tibble: 9 × 12
     ID Value  Int1  Int2  Dbl1  Dbl2 Date       .Int1 .Int2 .Dbl1 .Dbl2 .Date
  <dbl> <dbl> <int> <int> <dbl> <dbl> <date>     <dbl> <dbl> <dbl> <dbl> <dbl>
1     1    34     1    11   1.1  11.1 2001-01-01    34    34    34    34    34
2     1    54     2    12   2.2  12.2 2002-02-02    34    34    34    34    34
3     1    17     3    13   3.3  13.3 2003-03-03    34    34    34    34    34
4     2     2     1    11   1.1  11.1 2001-01-01     2     2     2     2     2
5     2    25     2    12   2.2  12.2 2002-02-02     2     2     2     2     2
6     2    89     3    13   3.3  13.3 2003-03-03     2     2     2     2     2
7     3    23     1    11   1.1  11.1 2001-01-01    23    23    23    23    23
8     3    93     2    12   2.2  12.2 2002-02-02    23    23    23    23    23
9     3    90     3    13   3.3  13.3 2003-03-03    23    23    23    23    23
  • Related