I am trying to tally across columns of a dataframe with values that exceed a corresponding limit variable. Here is a similar problem but for each point, the condition may change so rowSums
is not an option. I am interested in either modifications to the solution I have attempted below, any more efficient methods, and any simpler methods which help to solve this issue.
library(tidyverse)
set.seed(234)
ex_dat <- tibble(a = rnorm(n = 6, mean = 1),
a_lim = 0.75,
b = rnorm(n = 6, mean = 0.5),
b_lim = 0.333,
c = rnorm(n = 6, mean = 1.5),
c_lim = 1.0,
d = rnorm(n = 6, mean = 1.5),
d_lim = 1.25)
ex_dat %>%
rowwise() %>%
mutate(tally = sum(map_lgl(.x = c("a","b","c","d"),
.f = ~(noquote(.x) > noquote(paste0(.x,"_lim")))), na.rm = T))
For instance, the desired outcome would be that the 'tally' column here reads 4 in the first row, given that all 4 conditions are met. Then, it should read 2 for the second row, given that only 2 limits were exceeded.
I have tried a few variations on this approach here, using the bang-bang operator to force-evaluate the variables in the .f
argument to the map
function. So far, this attempt and silent-failure seems to be the closest and most sensible as far as I can wrap my head around. Evidently, I don't have a very firm grasp on non-standard evaluation, so my attempts with !!
and sym()
didn't take me far.
Again, if anyone sees that I am going about this problem in a roundabout or inefficient way, I would welcome redirection. Thank you.
CodePudding user response:
Here is a slightly different way to approach your problem. It relies on the "trick" that inequalities evalute to 1 if true, and 0 if false. So you can evaluate a, b, c, and d against their limits, and sum the evaluation of the 4 inequalities.
library(dplyr)
ex_dat_tally<-ex_dat %>%
mutate(tally=(a>a_lim) (b>b_lim) (c>c_lim) (d>d_lim))
CodePudding user response:
I find this base R approach intuitive, easy and it should also be fast since we are working on dataframe directly using vectorised approach.
lim_cols <- grep('lim', names(ex_dat), value = TRUE)
cols <- sub('_lim', '', lim_cols)
ex_dat$tally <- rowSums(ex_dat[cols] > ex_dat[lim_cols])
ex_dat
# a a_lim b b_lim c c_lim d d_lim tally
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1.66 0.75 0.709 0.333 1.47 1 2.03 1.25 4
#2 -1.05 0.75 -2.54 0.333 2.01 1 1.52 1.25 2
#3 -0.499 0.75 0.0131 0.333 2.49 1 1.71 1.25 2
#4 2.47 0.75 -0.588 0.333 1.80 1 2.52 1.25 3
#5 2.46 0.75 0.558 0.333 0.570 1 1.91 1.25 3
#6 1.14 0.75 1.60 0.333 1.58 1 0.795 1.25 3
Here we create two group of columns lim_cols
and cols
.
lim_cols
#[1] "a_lim" "b_lim" "c_lim" "d_lim"
cols
#[1] "a" "b" "c" "d"
Compare them pairwise and use rowSums
to calculate how many of them satisfy the condition.