I have a large dataset that I would like some help with. An example is given below:
id id_row material
1 1 1 1
2 1 2 1
3 1 3 1
4 2 1 1
5 2 2 2
6 2 3 1
7 3 1 1
8 3 2 1
9 3 3 1
10 4 1 1
11 4 2 2
I would like to add a new column based on the values in material for the same id (across rows). In the new colum, I would like all id with values 1 and 2 in material (across rows) to be identified (e.g. as value 99) and if not both are present then return either 1 or 2. Something like this:
id id_row material new_column
1 1 1 1 1
2 1 2 1 1
3 1 3 1 1
4 2 1 1 99
5 2 2 2 99
6 2 3 1 99
7 3 1 2 2
8 3 2 2 2
9 3 3 2 2
10 4 1 1 99
11 4 2 2 99
I have been looking online for a solution without any luck as well as tried using dplyr and group_by, mutate and ifelse without any luck. Thank you in advance!
CodePudding user response:
Try this approach:
library(tidyverse)
tribble(
~id, ~id_row, ~material,
1, 1, 1,
1, 2, 1,
1, 3, 1,
2, 1, 1,
2, 2, 2,
2, 3, 1,
3, 1, 2,
3, 2, 2,
3, 3, 2,
4, 1, 1,
4, 2, 2
) |>
group_by(id) |>
mutate(new_column = if_else(any(material == 2) & any(material == 1), 99, NA_real_),
new_column = if_else(is.na(new_column), material, new_column))
#> # A tibble: 11 × 4
#> # Groups: id [4]
#> id id_row material new_column
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 1
#> 2 1 2 1 1
#> 3 1 3 1 1
#> 4 2 1 1 99
#> 5 2 2 2 99
#> 6 2 3 1 99
#> 7 3 1 2 2
#> 8 3 2 2 2
#> 9 3 3 2 2
#> 10 4 1 1 99
#> 11 4 2 2 99
Created on 2022-05-25 by the reprex package (v2.0.1)