I have a follow-up on this question: Sum values from rows with conditions in R
Here is my data:
ID <- c(A,B,C,D,E,F)
Q1 <- c(0,1,7,9,na,3)
Q2 <- c(0,3,2,2,na,3)
Q3 <- c(0,0,7,9,na,3)
dta <- as.data.frame (ID,Q1,Q2,Q3)
I need to sum every value below 7, but in lines with values over 7, I need to sum all the numbers below 7 and ignore the ones over it. Rows with all NAs should be preserved. Result should look like this:
0
4
2
2
na
9
I have tried this code based on the response from the last post:
dta2 <- dta %>% rowwise() %>% mutate(ProxySum = ifelse(all(c_across(Q1:Q3) < 7), Reduce(` `, c_across(Q1:Q3)), (ifelse(any(c_across(Q1:Q3) > 7), sum(.[. < 7]), NA))))
But in the rows with numbers over 7 I end up with a sum of all the rows and columns. What I am missing?
CodePudding user response:
One way to do it in base
:
rowSums(dta[, 2:4] * (dta[, 2:4] < 7))
# [1] 0 4 2 2 NA 9
Adding explanation, according to @tjebo comment
- With
dta[, 2:4] < 7
you produce a dataframe populated withlogical
values, whereTRUE
orFALSE
corresponds to the values which are less or greater than7
. It is possible to do in one line, since this operation is vectorized; - Than, you multiply above logical dataframe, and a dataframe populated with your original values. Under the hood, R converts
logical
types intonumeric
types, so allFALSE
andTRUE
s from your logical dataset, are converted to0
s and1
s. Which means that you multiply your original values by1
if they are less than7
, and by0
s otherwise; - Since
NA < 7
producesNA
, and following multiplication byNA
will produceNA
s as well - you preserve the originalNA
s; - Last step is to call
rowSums()
on a resulting dataframe, which will sum up the values for each particular row. Since those of them that exceed7
are turned into0
s, you exclude them from resulting sum; - In case, when you want to get a sum for the rows where at least one value is not
NA
, you can usena.rm = TRUE
argument to yourrowSums()
call. However, in this case, for the rows withNA
s only you will get0
.
CodePudding user response:
Another option making use of rowSums
and dplyr::across
:
ID <- LETTERS[1:6]
Q1 <- c(0,1,7,9,NA,3)
Q2 <- c(0,3,2,2,NA,3)
Q3 <- c(0,0,7,9,NA,3)
dta <- data.frame(ID,Q1,Q2,Q3)
library(dplyr)
dta %>%
mutate(ProxySum = rowSums(across(Q1:Q3, function(.x) { .x[.x >= 7] <- 0; .x })))
#> ID Q1 Q2 Q3 ProxySum
#> 1 A 0 0 0 0
#> 2 B 1 3 0 4
#> 3 C 7 2 7 2
#> 4 D 9 2 9 2
#> 5 E NA NA NA NA
#> 6 F 3 3 3 9
CodePudding user response:
How about a slightly different approach - first pivot longer, then sum by condition by group, then pivot back.
In this current version, rows that contain only "some" NAs will return a value other than NA. (NA will be considered as 0). If you want to return NA for those rows, change all
to any
.
library(tidyverse)
ID <- c("A","B","C","D","E","F")
Q1 <- c(0,1,7,9,NA,3)
Q2 <- c(0,3,2,2,NA,3)
Q3 <- c(0,0,7,9,NA,3)
dta <- data.frame(ID,Q1,Q2,Q3)
dta %>%
pivot_longer(-ID) %>%
group_by(ID) %>%
mutate(ProxySum = ifelse(all(is.na(value)), NA, sum(value[which(value<7)]))) %>%
pivot_wider()
#> # A tibble: 6 × 5
#> # Groups: ID [6]
#> ID ProxySum Q1 Q2 Q3
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 A 0 0 0 0
#> 2 B 4 1 3 0
#> 3 C 2 7 2 7
#> 4 D 2 9 2 9
#> 5 E NA NA NA NA
#> 6 F 9 3 3 3
Created on 2021-12-14 by the reprex package (v2.0.1)
CodePudding user response:
Update: See @tjebo comment of identical solution as stefan:
Here is a non identical solution: using hablar
:
library(dplyr)
library(hablar)
dta %>%
rowwise() %>%
mutate(sum = sum_(across(Q1:Q3, ~case_when(.<7 ~sum_(.)))))
First answer: Possible identical to stefan's answer:
Here is another dplyr
solution:
library(dplyr)
dta %>%
mutate(across(where(is.numeric), ~ifelse(.>=7,0,.)),
sum = rowSums(across(where(is.numeric))))
ID Q1 Q2 Q3 sum
1 A 0 0 0 0
2 B 1 3 0 4
3 C 0 2 0 2
4 D 0 2 0 2
5 E NA NA NA NA
6 F 3 3 3 9