Home > Net >  case_when ignores some arguments when using group by
case_when ignores some arguments when using group by

Time:07-18

I have a longitudinal dataset with individuals of different socioeconomic statuses (SES) divided into 4 classes, high, mid, low mid, and low. For some of the analyses, I only want to show the sample size for low mid group if both the mid and low class groups have at least 5 individuals for that month's observations. Otherwise, I want it to show as NA.

I thought this code would work, but it doesn't. It should give NA the low mid group's 'adjusted_total' column in Jan, but keep it as its current value (40) for Feb. It fails to do the former but accomplishes the latter:

Here's my sample dataset and attempt at getting what I wanted using dplyr's case_when():

library(dplyr)

#Sample dataset
test_data <- tibble(month = c(rep(c("Jan"), 4), rep(c("Feb"), 4)),
                    ses = c(rep(c("High", "Mid", "Mid Low", "Low"), 2)),
                    total = c(10, 20, 4, 30, 9, 11, 40, 60),
                    total_selected = c(9, 10, 8, 3, 8, 6, 8, 6))

#Failed attempt
wrong <- test_data %>%
group_by(month) %>%
  mutate(adjusted_total = case_when(
    ses == "Mid Low" & total[ses == "Mid"] <5 | total[ses == "Low"] <5 ~ NA_real_,
    TRUE ~ total
  ))

EDIT WITH SOLUTION

I realized that I had a typo in my code. First, I meant an or statement, not an AND. Second, the threshold was too low for my data. When I adjust to an OR statement and the cut off to 15


correct <- tibble(month = c(rep(c("Jan"), 4), rep(c("Feb"), 4)),
                    ses = c(rep(c("High", "Mid", "Mid Low", "Low"), 2)),
                    total = c(10, 20, 4, 30, 9, 11, 40, 60),
                    total_selected = c(9, 10, 8, 3, 8, 6, 8, 6)) %>%
  group_by(month) %>%
  mutate(adjusted_total = case_when(
    ses == "Mid Low" & total[ses == "Mid"] < 15 | total[ses == "Low"] < 15 ~ NA_real_,
    TRUE ~ total
  ))

CodePudding user response:

case_when/ifelse/if_else all requires the arguments to be of same length. Here, one of the logical expression is of different length. A correct approach would be to wrap with any of the subset of 'total'

test_data %>%
group_by(month) %>%
  mutate(adjusted_total = case_when(
    ses == "Mid Low" & any(total[ses  %in% c("Mid", "Low")] < 15) ~ NA_real_,
    TRUE ~ total
  )) %>% 
ungroup

-output

# A tibble: 8 × 5
  month ses     total total_selected adjusted_total
  <chr> <chr>   <dbl>          <dbl>          <dbl>
1 Jan   High       10              9             10
2 Jan   Mid        20             10             20
3 Jan   Mid Low     4              8              4
4 Jan   Low        30              3             30
5 Feb   High        9              8              9
6 Feb   Mid        11              6             11
7 Feb   Mid Low    40              8             NA
8 Feb   Low        60              6             60

Or with replace

test_data %>%
   group_by(month) %>% 
   mutate(adjusted_total = replace(total,
    ses == "Mid Low" & any(total[ses %in% c("Mid", "Low")] < 15), 
    NA)) %>%
   ungroup
  • Related