Home > Net >  Rowsums on two vectors of paired columns but conditional on specific values
Rowsums on two vectors of paired columns but conditional on specific values

Time:01-31

I have a dataset that looks like the one below where there are three "pairs" of columns pertaining to the type (datA, datB, datC), and the total for each type (datA_total, datB_total, datC_total):

structure(list(datA = c(1L, NA, 5L, 3L, 8L, NA), datA_total = c(20L, 
30L, 40L, 15L, 10L, NA), datB = c(5L, 5L, NA, 6L, 1L, NA), datB_total = c(80L, 
10L, 10L, 5L, 4L, NA), datC = c(NA, 4L, 1L, NA, 3L, NA), datC_total = c(NA, 
10L, 15L, NA, 20L, NA)), class = "data.frame", row.names = c(NA, 
-6L))
#  datA datA_total datB datB_total datC datC_total
#1    1         20    5         80   NA         NA        
#2   NA         30    5         10    4         10         
#3    5         40   NA         10    1         15
#4    3         15    6          5   NA         NA  
#5    8         10    1          4    3         20
#6   NA         NA   NA         NA   NA         NA

I'm trying to create a rowSums across each row to determine the total visits across each data type conditional on whether they meet a criteria of having ANY score ranging (1-5).

Here is my thought process:

  1. Select only the variables that are the data types (i.e. datA, datB, datC)
  2. Across each row based on EACH data type, determine if that data type meets a criteria (i.e. datA -> does it contain (1,2,3,4,5))
  3. If that data type column does contain one of the 5 values above ^, then look to its paired total variable and ready that value to be rowSummed (i.e. datA -> does it contain (1,2,3,4,5)? -> if yes, then grab datA_total value = 20).

The goal is to end up with a total column like below:

#  datA datA_total datB datB_total datC datC_total overall_total
#1    1         20    5         80   NA         NA           100
#2   NA         30    5         10    4         10            20
#3    5         40   NA         10    1         15            55 
#4    3         15    6          5   NA         NA            15
#5    8         10    1          4    3         20            24
#6   NA         NA   NA         NA   NA         NA             0

You'll notice that row #2 only contained a total of 20 even though there is 30 in datA_total. This is a result of the conditional selection in that datA for row#2 contains "NA" rather than one of the five scores (1,2,3,4,5). Hence, the datA_total of 30 was not included in the rowSums calculation.

My code below shows the vectors I created and my attempt at a conditional rowSums but I end up getting an error regarding mutate... I'm not sure how to integrate the "conditional pairing" portion of this problem:

    type_vars <- c("datA", "datB", "datC")
    type_scores <- c("1", "2", "3", "4", "5")
    type_visits <- c("datA_total", "datB_total", "datC_total")

    df <- df %>%
           mutate(overall_total = rowSums(all_of(type_visits[type_vars %in% type_scores])))

Any help/tips would be appreciated

CodePudding user response:

dplyr's across should do the job.

library(dplyr)

# copying your tibble
data <-
  tibble(
  datA = c(1, NA, 5, 3, 8, NA),
  datA_total = c(20, 30, 40, 15, 10, NA),
  datB = c(5, 5, NA, 6, 1, NA),
  datB_total = c(80, 10, 10, 5, 4, NA),
  datC = c(NA, 4, 1, NA, 3, NA),
  datC_total = c(NA, 10, 15, NA, 20, NA)
)

data %>%
  mutate(across(c('A', 'B', 'C') %>% paste0('dat', .), \(x) (x %in% 1:5) * get(cur_column() %>% paste0(., '_total')), .names = "{col}_aux")) %>%
  rowwise() %>%
  mutate(overall_total = sum(across(ends_with('aux')), na.rm = TRUE)) %>%
  select(any_of(c(names(data), 'overall_total')))

# A tibble: 6 × 7
   datA datA_total  datB datB_total  datC datC_total overall_total
  <dbl>      <dbl> <dbl>      <dbl> <dbl>      <dbl>         <dbl>
1     1         20     5         80    NA         NA           100
2    NA         30     5         10     4         10            20
3     5         40    NA         10     1         15            55
4     3         15     6          5    NA         NA            15
5     8         10     1          4     3         20            24
6    NA         NA    NA         NA    NA         NA             0

First, we create an 'aux' column for each dat. It is 0 if dat is not within 1:5, and dat_total otherwise. Then we sum ignoring NA.

  • Related