Consider this data:
df <- data.frame(group = c(1, 2, 2, 2),
start = c(2, 7, 7, 7),
stop = c(8, 7, 8, 9),
unstop = c(10, 7, 9, 10))
I now want to setup a more or less simple case_when per group in the form of "if first row do this, if second row do that". However, I'm getting an error. I assume it's because group 1 does only have one row so the condition can't be checked:
df |>
group_by(group) |>
mutate(n_rows = n(),
split_weeks = case_when(n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 & row_number() == 1 ~ str_c(c(start:stop, unstop:lead(stop)), collapse = ","),
TRUE ~ "fail"))
Error in `mutate()`:
! Problem while computing `split_weeks = case_when(...)`.
ℹ The error occurred in group 1: group = 1.
Caused by error in `unstop:lead(stop)`:
! NA/NaN argument
Run `rlang::last_error()` to see where the error occurred.
Any idea what's going on here?
I assume it has to do with the lead
function, because if I remove that part, I "only get warnings, but at least I get a result.
Expected output:
# A tibble: 4 × 6
# Groups: group [2]
group start stop unstop n_rows split_weeks
<dbl> <dbl> <dbl> <dbl> <int> <chr>
1 1 2 8 10 1 2,3,4,5,6,7,8
2 2 7 7 7 3 7,8
3 2 7 8 9 3 fail
4 2 7 9 10 3 fail
CodePudding user response:
Here's an alternative that yields the desired output (at least in this case). @Andy Baxter provided a good explanation about why the original fails out; even though case_when uses the result of the first case, the 2nd case throws an error so the operation fails. You could get around this by using lead(stop, default = 0)
or perhaps coalesce(lead(stop), SOMETHING)
, either of which would produce a calculable (if meaningless/unneeded) result when there's no "next" value.
df |>
group_by(group) |>
mutate(n_rows = n()) %>%
mutate(split_weeks = case_when(
n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 & row_number() == 1 ~ str_c(unstop:(lead(stop, default = 0)), collapse = ","),
# n_rows > 1 & row_number() == 1 ~ str_c(unstop:(coalesce(lead(stop), unstop)), collapse = ","), # Alternative
TRUE ~ "fail"))
Result
# A tibble: 4 × 6
# Groups: group [2]
group start stop unstop n_rows split_weeks
<dbl> <dbl> <dbl> <dbl> <int> <chr>
1 1 2 8 10 1 2,3,4,5,6,7,8
2 2 7 7 7 3 7,8
3 2 7 8 9 3 fail
4 2 7 9 10 3 fail
CodePudding user response:
I think the errors are coming up when you ask R to find a lead
either a) beyond the end of the table rows or b) outside of the group. You could pass it a default value of 0, which is never used and supresses the error, but only gets halfway there as the function is trying to concatenate every start:stop
and unstop:lead(stop)
value across all rows per group:
library(tidyverse)
df <- data.frame(group = c(1, 2, 2, 2),
start = c(2, 7, 7, 7),
stop = c(8, 7, 8, 9),
unstop = c(10, 7, 9, 10))
df |>
group_by(group) |>
mutate(
n_rows = n(),
split_weeks = case_when(
n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 &
row_number() == 1 ~ str_c(c(start:stop, unstop:lead(stop, default = 0)), collapse = ","),
TRUE ~ "fail"
)
)
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in start:stop: numerical expression has 3 elements: only the first used
#> Warning in unstop:lead(stop, 0): numerical expression has 3 elements: only the
#> first used
#> Warning in unstop:lead(stop, 0): numerical expression has 3 elements: only the
#> first used
#> # A tibble: 4 × 6
#> # Groups: group [2]
#> group start stop unstop n_rows split_weeks
#> <dbl> <dbl> <dbl> <dbl> <int> <chr>
#> 1 1 2 8 10 1 2,3,4,5,6,7,8
#> 2 2 7 7 7 3 7,7
#> 3 2 7 8 9 3 fail
#> 4 2 7 9 10 3 fail
A way of tidying up would be:
- finding lead values outside of groups (with default to avoid error in last row)
- find number of rows and row numbers within groups
- ungroup again!
- do rowwise calculations so that R only focuses on values in that row
- perform concatenation
Which ends up as so, though not sure why 7,7,8
is being put in that cell (makes sense as it's concatenating 7 to 7 and 7 to 8):
df |>
mutate(lead_stop = lead(stop, default = 0)) |>
group_by(group) |>
mutate(
n_rows = n(),
rownum = row_number()) |>
ungroup() |>
rowwise() |>
mutate(
split_weeks = case_when(
rownum > 1 ~ "fail",
n_rows == 1 ~ str_c(start:stop, collapse = ","),
n_rows > 1 & rownum == 1 ~ str_c(c(start:stop, unstop:lead_stop), collapse = ","),
TRUE ~ "fail"
)
)
#> # A tibble: 4 × 8
#> # Rowwise:
#> group start stop unstop lead_stop n_rows rownum split_weeks
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr>
#> 1 1 2 8 10 7 1 1 2,3,4,5,6,7,8
#> 2 2 7 7 7 8 3 1 7,7,8
#> 3 2 7 8 9 9 3 2 fail
#> 4 2 7 9 10 0 3 3 fail
Created on 2022-05-07 by the reprex package (v2.0.1)