I have a longitudinal dataset with individuals of different socioeconomic statuses (SES) divided into 4 classes, high, mid, low mid, and low. For some of the analyses, I only want to show the sample size for low mid group if both the mid and low class groups have at least 5 individuals for that month's observations. Otherwise, I want it to show as NA.
I thought this code would work, but it doesn't. It should give NA the low mid group's 'adjusted_total' column in Jan, but keep it as its current value (40) for Feb. It fails to do the former but accomplishes the latter:
Here's my sample dataset and attempt at getting what I wanted using dplyr's case_when():
library(dplyr)
#Sample dataset
test_data <- tibble(month = c(rep(c("Jan"), 4), rep(c("Feb"), 4)),
ses = c(rep(c("High", "Mid", "Mid Low", "Low"), 2)),
total = c(10, 20, 4, 30, 9, 11, 40, 60),
total_selected = c(9, 10, 8, 3, 8, 6, 8, 6))
#Failed attempt
wrong <- test_data %>%
group_by(month) %>%
mutate(adjusted_total = case_when(
ses == "Mid Low" & total[ses == "Mid"] <5 | total[ses == "Low"] <5 ~ NA_real_,
TRUE ~ total
))
EDIT WITH SOLUTION
I realized that I had a typo in my code. First, I meant an or statement, not an AND. Second, the threshold was too low for my data. When I adjust to an OR statement and the cut off to 15
correct <- tibble(month = c(rep(c("Jan"), 4), rep(c("Feb"), 4)),
ses = c(rep(c("High", "Mid", "Mid Low", "Low"), 2)),
total = c(10, 20, 4, 30, 9, 11, 40, 60),
total_selected = c(9, 10, 8, 3, 8, 6, 8, 6)) %>%
group_by(month) %>%
mutate(adjusted_total = case_when(
ses == "Mid Low" & total[ses == "Mid"] < 15 | total[ses == "Low"] < 15 ~ NA_real_,
TRUE ~ total
))
CodePudding user response:
case_when/ifelse/if_else
all requires the arguments to be of same length. Here, one of the logical expression is of different length. A correct approach would be to wrap with any
of the subset of 'total'
test_data %>%
group_by(month) %>%
mutate(adjusted_total = case_when(
ses == "Mid Low" & any(total[ses %in% c("Mid", "Low")] < 15) ~ NA_real_,
TRUE ~ total
)) %>%
ungroup
-output
# A tibble: 8 × 5
month ses total total_selected adjusted_total
<chr> <chr> <dbl> <dbl> <dbl>
1 Jan High 10 9 10
2 Jan Mid 20 10 20
3 Jan Mid Low 4 8 4
4 Jan Low 30 3 30
5 Feb High 9 8 9
6 Feb Mid 11 6 11
7 Feb Mid Low 40 8 NA
8 Feb Low 60 6 60
Or with replace
test_data %>%
group_by(month) %>%
mutate(adjusted_total = replace(total,
ses == "Mid Low" & any(total[ses %in% c("Mid", "Low")] < 15),
NA)) %>%
ungroup