Short version: I need is to get a results column r
like this, ideally using dplyr (but happy for base R as well):
d <- tibble(c1 = c(T,T,F,T,F,NA), c2 = c(T,F,F,F,F,NA), c3 = c(T,F,F,NA,NA,NA))
d %>% rowwise() %>% mutate(r = something())
# A tibble: 6 x 3
c1 c2 c3 r
<lgl> <lgl> <lgl> <lgl>
1 TRUE TRUE TRUE TRUE
2 TRUE FALSE FALSE TRUE
3 FALSE FALSE FALSE FALSE
4 TRUE FALSE NA TRUE
5 FALSE FALSE NA FALSE
6 NA NA NA NA
I understand why NA|FALSE == NA
. Each TRUE
/FALSE
in this table is the result of a comparison, and I would really like to keep the syntax as short as possible.
Long version:
I have survey results, and need to create a summary of three questions asking for the primary, secondary and tertiary 'route to something' (there are more than 3 levels in reality). The summary should tell me, for each respondent, whether they made use of route A, route B, etc. Not all respondents filled in all questions, so there might be NA
s. Some respondents didn't answer any of the question at all, and their summary should be NA
. So I have:
df <- tibble(primary = c("C", "A", "B", "D", NA),
secondary = c("B", "D", "C", NA, NA),
tertiary = c("A", "E", NA, NA, NA))
# I think I need something along these lines:
df <- df %>% rowwise() %>%
mutate(
routeA = (primary == "A") | (secondary == "A") | (tertiary == "A") ...
routeB = ....
)
# Result expected
df
# A tibble:
primary secondary tertiary routeA routeB ...
<chr> <chr> <chr> <lgl> <lgl>
C B A TRUE TRUE
A D E TRUE FALSE
B C NA FALSE TRUE
D NA NA FALSE FALSE
NA NA NA NA NA
CodePudding user response:
In base R, you can do this relatively efficiently with apply
-match
:
f <- function(x, levels) {
if (all(is.na(x))) {
rep.int(NA, length(levels))
} else {
as.logical(match(levels, x, 0L))
}
}
lvs <- LETTERS[1:4]
df[paste0("route", lvs)] <- t(apply(df, 1L, f, levels = lvs))
df
## # A tibble: 5 × 7
## primary secondary tertiary routeA routeB routeC routeD
## <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl>
## 1 C B A TRUE TRUE TRUE FALSE
## 2 A D E TRUE FALSE FALSE TRUE
## 3 B C NA FALSE TRUE TRUE FALSE
## 4 D NA NA FALSE FALSE FALSE TRUE
## 5 NA NA NA NA NA NA NA
I say "relatively" because rowwise operations on data frames tend not to be very efficient, requiring coercions to and from matrix or reshaping to and from long format. (This case is no exception, as apply
coerces from data frame to matrix and the assignment coerces from matrix back to data frame.)
CodePudding user response:
Suboptimal:
my_match <- function(x, val) {
if (all(is.na(x))) return(NA)
return(any(na.omit(x) == val))
}
df %>% rowwise() %>% mutate(rA = my_match(c_across(where(is.character)), "A"),
rB = my_match(c_across(where(is.character)), "B"))
To be improved:
- this won't scale well to larger numbers of routes
- too much repeated code (another way of saying the same thing) — but I'm not quite sure how to create a function/shortcut version of this (could loop over the possible sites adding one column at a time, but I don't feel like going quite as far as necessary down the
rlang
/tidy-evaluation/NSE rabbit hole right now ...)
CodePudding user response:
As mentioned in the comments, this is straightforward when the data is reshaped to long format and then back to wide.
library(tidyr)
library(dplyr)
library(tibble)
df <- df %>%
rowid_to_column()
df %>%
pivot_longer(-rowid) %>%
filter(!is.na(value)) %>%
pivot_wider(id_cols = rowid, names_from = value, values_fill = FALSE, values_fn = ~ TRUE, names_sort = TRUE) %>%
left_join(df, ., by = "rowid")
# A tibble: 5 x 9
rowid primary secondary tertiary A B C D E
<int> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl>
1 1 C B A TRUE TRUE TRUE FALSE FALSE
2 2 A D E TRUE FALSE FALSE TRUE TRUE
3 3 B C NA FALSE TRUE TRUE FALSE FALSE
4 4 D NA NA FALSE FALSE FALSE TRUE FALSE
5 5 NA NA NA NA NA NA NA NA