Home > database >  Sum values from rows ignoring certain values in R
Sum values from rows ignoring certain values in R

Time:12-15

I have a follow-up on this question: Sum values from rows with conditions in R

Here is my data:

ID <- c(A,B,C,D,E,F)
Q1 <- c(0,1,7,9,na,3) 
Q2 <- c(0,3,2,2,na,3) 
Q3 <- c(0,0,7,9,na,3) 

dta <- as.data.frame (ID,Q1,Q2,Q3) 

I need to sum every value below 7, but in lines with values over 7, I need to sum all the numbers below 7 and ignore the ones over it. Rows with all NAs should be preserved. Result should look like this:

0
4
2
2
na
9

I have tried this code based on the response from the last post:

dta2  <- dta %>% rowwise() %>% mutate(ProxySum = ifelse(all(c_across(Q1:Q3) < 7), Reduce(` `, c_across(Q1:Q3)), (ifelse(any(c_across(Q1:Q3) > 7), sum(.[. <  7]), NA))))

But in the rows with numbers over 7 I end up with a sum of all the rows and columns. What I am missing?

CodePudding user response:

One way to do it in base:

rowSums(dta[, 2:4] * (dta[, 2:4] < 7))

# [1]  0  4  2  2 NA  9

Adding explanation, according to @tjebo comment

  • With dta[, 2:4] < 7 you produce a dataframe populated with logical values, where TRUE or FALSE corresponds to the values which are less or greater than 7. It is possible to do in one line, since this operation is vectorized;
  • Than, you multiply above logical dataframe, and a dataframe populated with your original values. Under the hood, R converts logical types into numeric types, so all FALSE and TRUEs from your logical dataset, are converted to 0s and 1s. Which means that you multiply your original values by 1 if they are less than 7, and by 0s otherwise;
  • Since NA < 7 produces NA, and following multiplication by NA will produce NAs as well - you preserve the original NAs;
  • Last step is to call rowSums() on a resulting dataframe, which will sum up the values for each particular row. Since those of them that exceed 7 are turned into 0s, you exclude them from resulting sum;
  • In case, when you want to get a sum for the rows where at least one value is not NA, you can use na.rm = TRUE argument to your rowSums() call. However, in this case, for the rows with NAs only you will get 0.

CodePudding user response:

Another option making use of rowSums and dplyr::across:

ID <- LETTERS[1:6]
Q1 <- c(0,1,7,9,NA,3) 
Q2 <- c(0,3,2,2,NA,3) 
Q3 <- c(0,0,7,9,NA,3) 

dta <- data.frame(ID,Q1,Q2,Q3) 

library(dplyr)

dta %>% 
  mutate(ProxySum = rowSums(across(Q1:Q3, function(.x) { .x[.x >= 7] <- 0; .x })))
#>   ID Q1 Q2 Q3 ProxySum
#> 1  A  0  0  0        0
#> 2  B  1  3  0        4
#> 3  C  7  2  7        2
#> 4  D  9  2  9        2
#> 5  E NA NA NA       NA
#> 6  F  3  3  3        9

CodePudding user response:

How about a slightly different approach - first pivot longer, then sum by condition by group, then pivot back.

In this current version, rows that contain only "some" NAs will return a value other than NA. (NA will be considered as 0). If you want to return NA for those rows, change all to any.

library(tidyverse)

ID <- c("A","B","C","D","E","F")
Q1 <- c(0,1,7,9,NA,3) 
Q2 <- c(0,3,2,2,NA,3) 
Q3 <- c(0,0,7,9,NA,3) 

dta <- data.frame(ID,Q1,Q2,Q3) 

dta %>%
  pivot_longer(-ID) %>%
  group_by(ID) %>%
  mutate(ProxySum = ifelse(all(is.na(value)), NA, sum(value[which(value<7)]))) %>%
  pivot_wider()
#> # A tibble: 6 × 5
#> # Groups:   ID [6]
#>   ID    ProxySum    Q1    Q2    Q3
#>   <chr>    <dbl> <dbl> <dbl> <dbl>
#> 1 A            0     0     0     0
#> 2 B            4     1     3     0
#> 3 C            2     7     2     7
#> 4 D            2     9     2     9
#> 5 E           NA    NA    NA    NA
#> 6 F            9     3     3     3

Created on 2021-12-14 by the reprex package (v2.0.1)

CodePudding user response:

Update: See @tjebo comment of identical solution as stefan:

Here is a non identical solution: using hablar:

library(dplyr)
library(hablar)

dta %>% 
  rowwise() %>% 
  mutate(sum = sum_(across(Q1:Q3, ~case_when(.<7 ~sum_(.)))))

First answer: Possible identical to stefan's answer:

Here is another dplyr solution:

library(dplyr)
dta %>% 
  mutate(across(where(is.numeric), ~ifelse(.>=7,0,.)),
         sum = rowSums(across(where(is.numeric))))
  ID Q1 Q2 Q3 sum
1  A  0  0  0   0
2  B  1  3  0   4
3  C  0  2  0   2
4  D  0  2  0   2
5  E NA NA NA  NA
6  F  3  3  3   9
  • Related