Home > Software design >  R Conditionally mutate rows for specific groups
R Conditionally mutate rows for specific groups

Time:07-09

I would like to mutate certain rows meeting a condition for specific groups that meet another condition.

The Aim:

For example, I'm trying to extract the mother's name from the below dataset and apply it beside rows labelled with 'children' for only groups that has all 'Monogamists' in the below dataset:

df <- tribble(
  ~family, ~sequence, ~role,     ~state,        ~name,     ~year_of_birth,
  "A",      1,       "Father",  "Monogamist",  "Adam",     1980,
  "A",      2,       "Mother",  "Monogamist",  "Sarah",    1981,
  "A",      3,       "Child",   "Monogamist",  "Omar",     2000,
  "A",      4,       "Child",   "Monogamist",  "Joseph",   2001,
  "B",      1,       "Father",  "Polygamist",  "Ali",      1990,
  "B",      2,       "Mother",  "Polygamist",  "Miriam",   1998,
  "B",      2,       "Child",   "Polygamist",  "Noah",     1992,
  "B",      3,       "Child",   "Polygamist",  "Jacob",    1998,
  "B",      4,       "Child",   "Polygamist",  "Layla",    2014,
  "C",      1,       "Father",  "Widower",     "Ibrahim",  2020,
  "C",      3,       "Child",   "Widower",     "Zakariya", 2021,
  "C",      4,       "Child",    "Widower",     "Kahlid",  2022,
)

The problem:

The problem is that more occurance of one wife or have none causes my attempt/trial to fail.

My Attempt:

My attemp is to conditionally mutate such rows link with the below:


# Not-working

df %>%
  group_by(family) %>% 
  mutate(mother_name = case_when(!sequence %in% c(1,2) ~ name[sequence == 2],TRUE ~ "")) 

Error in `mutate()`:
! Problem while computing `mother_name = case_when(...)`.
i The error occurred in group 2: family = "B".
Caused by error in `case_when()`:
! `!sequence %in% c(1, 2) ~ name[sequence == 2]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.

# Working

df %>%
  group_by(family) %>% 
  filter(any(state == "Monogamist")) %>% 
  mutate(mother_name = case_when(!sequence %in% c(1,2) ~ name[sequence == 2],TRUE ~ "")) 
# A tibble: 4 x 7
# Groups:   family [1]
  family sequence role   state      name   year_of_birth mother_name
  <chr>     <dbl> <chr>  <chr>      <chr>          <dbl> <chr>      
1 A             1 Father Monogamist Adam            1980 ""         
2 A             2 Mother Monogamist Sarah           1981 ""         
3 A             3 Child  Monogamist Omar            2000 "Sarah"    
4 A             4 Child  Monogamist Joseph          2001 "Sarah"      

Expected output

How to achieve the below output. It would be nice to combine mother's name based on year of birth ascendingly. Adding this condition any(state == "Monogamist") to the case_when is what making me stuck.

   family sequence   role      state     name year_of_birth    mother_name
1       A        1 Father Monogamist     Adam          1980             NA
2       A        2 Mother Monogamist    Sarah          1981          Sarah
3       A        3  Child Monogamist     Omar          2000             NA
4       A        4  Child Monogamist   Joseph          2001             NA
5       B        1 Father Polygamist      Ali          1990             NA
6       B        2 Mother Polygamist   Miriam          1998 Fatima, Miriam
7       B        2  Child Polygamist   Fatima          1992 Fatima, Miriam
8       B        3  Child Polygamist    Jacob          1998             NA
9       B        4  Child Polygamist    Layla          2014             NA
10      C        1 Father    Widower  Ibrahim          2020             NA
11      C        3  Child    Widower Zakariya          2021             NA
12      C        4  Child    Widower   Kahlid          2022             NA

CodePudding user response:

The solution below:

  • Extract the mother's name
  • Apply it beside rows labelled with 'children'
  • Deals with the occurrence of more wifes or none
  • Sort the data within the groups on birth.

A solution using role:

df |>
  group_by(family) |>
  arrange(year_of_birth) |>
  mutate(mother_name = ifelse(role == "Child" & !is_empty(name[role == "Mother"]),
                              paste(name[role == "Mother"], collapse = ", "),
                              NA))

Or using sequence:

df |>
  group_by(family) |>
  arrange(family, year_of_birth) |>
  mutate(mother_name = ifelse(sequence > 2 & !is_empty(name[sequence == 2]),
                              paste(name[sequence == 2], collapse=", "),
                              NA)) |>
  ungroup()

Output:

# A tibble: 12 × 7
   family sequence role   state      name     year_of_birth mother_name   
   <chr>     <dbl> <chr>  <chr>      <chr>            <dbl> <chr>         
 1 A             1 Father Monogamist Adam              1980 NA            
 2 A             2 Mother Monogamist Sarah             1981 NA            
 3 A             3 Child  Monogamist Omar              2000 Sarah         
 4 A             4 Child  Monogamist Joseph            2001 Sarah         
 5 B             1 Father Polygamist Ali               1990 NA            
 6 B             2 Mother Polygamist Fatima            1992 NA            
 7 B             2 Mother Polygamist Miriam            1998 NA            
 8 B             3 Child  Polygamist Jacob             1998 Fatima, Miriam
 9 B             4 Child  Polygamist Layla             2014 Fatima, Miriam
10 C             1 Father Widower    Ibrahim           2020 NA            
11 C             3 Child  Widower    Zakariya          2021 NA            
12 C             4 Child  Widower    Kahlid            2022 NA            

New data according to your instructions:

df <- tribble(
  ~family, ~sequence, ~role,     ~state,        ~name,
  "A",      1,       "Father",  "Monogamist",  "Adam",
  "A",      2,       "Mother",  "Monogamist",  "Sarah",
  "A",      3,       "Child",   "Monogamist",  "Omar",
  "A",      4,       "Child",   "Monogamist",  "Joseph",
  "B",      1,       "Father",  "Polygamist",  "Ali",
  "B",      2,       "Mother",  "Polygamist",  "Miriam",
  "B",      2,       "Mother",   "Polygamist", "Fatima",
  "B",      3,       "Child",   "Polygamist",  "Jacob",
  "B",      4,       "Child",   "Polygamist",  "Layla",
  "C",      1,       "Father",  "Widower",     "Ibrahim",
  "C",      3,       "Child",   "Widower",     "Zakariya",
  "C",      4,       "Child",    "Widower",     "Kahlid"
) |> add_column(year_of_birth = c(1980, 1981, 2000, 2001, 1990, 1998, 1992, 1998, 2014, 2020, 2021, 2022))
  • Related