I would like to mutate certain rows meeting a condition for specific groups that meet another condition.
The Aim:
For example, I'm trying to extract the mother's name from the below dataset and apply it beside rows labelled with 'children' for only groups that has all 'Monogamists' in the below dataset:
df <- tribble(
~family, ~sequence, ~role, ~state, ~name, ~year_of_birth,
"A", 1, "Father", "Monogamist", "Adam", 1980,
"A", 2, "Mother", "Monogamist", "Sarah", 1981,
"A", 3, "Child", "Monogamist", "Omar", 2000,
"A", 4, "Child", "Monogamist", "Joseph", 2001,
"B", 1, "Father", "Polygamist", "Ali", 1990,
"B", 2, "Mother", "Polygamist", "Miriam", 1998,
"B", 2, "Child", "Polygamist", "Noah", 1992,
"B", 3, "Child", "Polygamist", "Jacob", 1998,
"B", 4, "Child", "Polygamist", "Layla", 2014,
"C", 1, "Father", "Widower", "Ibrahim", 2020,
"C", 3, "Child", "Widower", "Zakariya", 2021,
"C", 4, "Child", "Widower", "Kahlid", 2022,
)
The problem:
The problem is that more occurance of one wife or have none causes my attempt/trial to fail.
My Attempt:
My attemp is to conditionally mutate such rows link with the below:
# Not-working
df %>%
group_by(family) %>%
mutate(mother_name = case_when(!sequence %in% c(1,2) ~ name[sequence == 2],TRUE ~ ""))
Error in `mutate()`:
! Problem while computing `mother_name = case_when(...)`.
i The error occurred in group 2: family = "B".
Caused by error in `case_when()`:
! `!sequence %in% c(1, 2) ~ name[sequence == 2]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.
# Working
df %>%
group_by(family) %>%
filter(any(state == "Monogamist")) %>%
mutate(mother_name = case_when(!sequence %in% c(1,2) ~ name[sequence == 2],TRUE ~ ""))
# A tibble: 4 x 7
# Groups: family [1]
family sequence role state name year_of_birth mother_name
<chr> <dbl> <chr> <chr> <chr> <dbl> <chr>
1 A 1 Father Monogamist Adam 1980 ""
2 A 2 Mother Monogamist Sarah 1981 ""
3 A 3 Child Monogamist Omar 2000 "Sarah"
4 A 4 Child Monogamist Joseph 2001 "Sarah"
Expected output
How to achieve the below output. It would be nice to combine mother's name based on year of birth ascendingly. Adding this condition any(state == "Monogamist")
to the case_when
is what making me stuck.
family sequence role state name year_of_birth mother_name
1 A 1 Father Monogamist Adam 1980 NA
2 A 2 Mother Monogamist Sarah 1981 Sarah
3 A 3 Child Monogamist Omar 2000 NA
4 A 4 Child Monogamist Joseph 2001 NA
5 B 1 Father Polygamist Ali 1990 NA
6 B 2 Mother Polygamist Miriam 1998 Fatima, Miriam
7 B 2 Child Polygamist Fatima 1992 Fatima, Miriam
8 B 3 Child Polygamist Jacob 1998 NA
9 B 4 Child Polygamist Layla 2014 NA
10 C 1 Father Widower Ibrahim 2020 NA
11 C 3 Child Widower Zakariya 2021 NA
12 C 4 Child Widower Kahlid 2022 NA
CodePudding user response:
The solution below:
- Extract the mother's name
- Apply it beside rows labelled with 'children'
- Deals with the occurrence of more wifes or none
- Sort the data within the groups on birth.
A solution using role
:
df |>
group_by(family) |>
arrange(year_of_birth) |>
mutate(mother_name = ifelse(role == "Child" & !is_empty(name[role == "Mother"]),
paste(name[role == "Mother"], collapse = ", "),
NA))
Or using sequence
:
df |>
group_by(family) |>
arrange(family, year_of_birth) |>
mutate(mother_name = ifelse(sequence > 2 & !is_empty(name[sequence == 2]),
paste(name[sequence == 2], collapse=", "),
NA)) |>
ungroup()
Output:
# A tibble: 12 × 7
family sequence role state name year_of_birth mother_name
<chr> <dbl> <chr> <chr> <chr> <dbl> <chr>
1 A 1 Father Monogamist Adam 1980 NA
2 A 2 Mother Monogamist Sarah 1981 NA
3 A 3 Child Monogamist Omar 2000 Sarah
4 A 4 Child Monogamist Joseph 2001 Sarah
5 B 1 Father Polygamist Ali 1990 NA
6 B 2 Mother Polygamist Fatima 1992 NA
7 B 2 Mother Polygamist Miriam 1998 NA
8 B 3 Child Polygamist Jacob 1998 Fatima, Miriam
9 B 4 Child Polygamist Layla 2014 Fatima, Miriam
10 C 1 Father Widower Ibrahim 2020 NA
11 C 3 Child Widower Zakariya 2021 NA
12 C 4 Child Widower Kahlid 2022 NA
New data according to your instructions:
df <- tribble(
~family, ~sequence, ~role, ~state, ~name,
"A", 1, "Father", "Monogamist", "Adam",
"A", 2, "Mother", "Monogamist", "Sarah",
"A", 3, "Child", "Monogamist", "Omar",
"A", 4, "Child", "Monogamist", "Joseph",
"B", 1, "Father", "Polygamist", "Ali",
"B", 2, "Mother", "Polygamist", "Miriam",
"B", 2, "Mother", "Polygamist", "Fatima",
"B", 3, "Child", "Polygamist", "Jacob",
"B", 4, "Child", "Polygamist", "Layla",
"C", 1, "Father", "Widower", "Ibrahim",
"C", 3, "Child", "Widower", "Zakariya",
"C", 4, "Child", "Widower", "Kahlid"
) |> add_column(year_of_birth = c(1980, 1981, 2000, 2001, 1990, 1998, 1992, 1998, 2014, 2020, 2021, 2022))