Home > Software engineering >  Sum Columns on Conditional Statement in r
Sum Columns on Conditional Statement in r

Time:02-03

I want to sum the 3 previous columns of var1_stat and var2_stat individually based on the following conditions: a1 is a match, event_id is desc order, and the entry != 0.

example

I'm uncertain how to write this code conditionally, so my hack was to just lag the variables individually and sum them up. Since I can't skip zero's and still maintain my desired info, I have added a count of the instances where the var's don't equal 0. I'll later divide the sum by this number.

Moving forward: is there a more elegant way to write this code so I don't have to list lags for each variable? Also, is there a way to find the previous 3 column instances where the var is not equal to zero.

library(janitor)
library(dplyr)

a1 <- c('adam', 'adam', 'adam', 'adam', 'megan', 'megan', 'megan','jen', 'jen', 'jen', 'jen')
event_id <- as.numeric(c('1', '2', '3', '6', '2', '4', '5', '1', '2', '4', '6'))
var1_stat <- as.numeric(c('3.2', '2.1', '2.7', '0','2','3.3', '1.1', '0', '4.1', '2.2','3.1'))
var2_stat <- as.numeric(c('3.4', '3', '0','1.7', '1.2', '3', '0','3.3', '2.1', '3.2', '0'))
test_ex <- data.frame(a1, event_id, var1_stat, var2_stat)


test_ex3 <- test_ex %>%
  group_by(a1) %>%
  mutate(var1_lag = lag(var1_stat),
         var1_lag2 = lag(var1_stat,2),
         var1_lag3 = lag(var1_stat,3 )
  ) %>% 
  mutate(var2_lag = lag(var2_stat),
         var2_lag2 = lag(var2_stat,2),
         var2_lag3 = lag(var2_stat,3)
         ) %>%
  ungroup()

test_ex3$var1_sum3 <- rowSums(test_ex3[,5:7], na.rm = TRUE)
test_ex3$var1_count3 <- rowSums(test_ex3[,5:7] != 0, , na.rm = TRUE)
test_ex3$var2_sum3 <- rowSums(test_ex3[,8:10], na.rm = TRUE)
test_ex3$var2_count3 <- rowSums(test_ex3[,8:10] != 0, , na.rm = TRUE)


test_ex3 <- test_ex3 %>%
  select(-var1_lag,-var1_lag2, -var1_lag3, -var2_lag,-var2_lag2, -var2_lag3)

CodePudding user response:

Something like this should work

df %>%
  group_by(a1) %>%
  arrange(event_id) %>%
  mutate(newvar = sum(df[var1_stat !=0 & var2_stat !=0], na.rm = TRUE)) %>%
  ungroup

CodePudding user response:

After reviewing some more posts, I was able to write this code which is signficantly cleaner than my original. That said, still can't find something that does a rolling sum with conditional filters. ie get the sum of the last 3 instances where the var doesn't equal 0.

test_ex3 <- test_ex %>%
  group_by(a1) %>%
  mutate(var1_sum3_u=lag(cumsum(var1_stat),k=3, default=0)) %>%
  mutate(var1_count3_u=lag(cumsum(var1_stat != 0),k=3, default=0)) 
  • Related