Home > Back-end >  Create a new value based on summing specific columns in R
Create a new value based on summing specific columns in R

Time:05-16

I have a dataframe:

dat <- data.frame(col1 = sample(0:3, 10, replace = TRUE),
                  col2 = sample(0:3, 10, replace = TRUE),
                  col3 = sample(0:3, 10, replace = TRUE),
                  col4 = sample(0:3, 10, replace = TRUE))

I want to create a new vector (outside of the dataframe) var that will state 1 if the sum of col3 and col4 is >= 4 and 0 otherwise. How can I do this? I tried using sum within an ifelse statement but it seems to produce a character output.

Any leads? Thanks!

CodePudding user response:

In a more general way, you can also go the apply route with all sorts of further logic included in the defined function should such be needed...

apply(dat,1,FUN=function (x) {as.integer(sum(x[3:4], na.rm=TRUE)>= 4)})      

CodePudding user response:

With dplyr, we can use mutate to create a new column (var) using rowSums and the condition of whether the sum of col3 and col4 is greater than or equal to 4. Here, I use to convert from logical to 0 or 1. Then, we can use pull to get the vector for var.

library(tidyverse)

var <- dat %>% 
  mutate(var =  (rowSums(select(., c(col3:col4)), na.rm = TRUE) >= 4)) %>% 
  pull(var)

Output

[1] 1 1 1 0 0 1 1 1 0 0

Or another option is to use sum with c_across for each row:

var <- dat %>% 
  rowwise() %>% 
  mutate(var =  (sum(c_across(col3:col4), na.rm = TRUE) >= 4)) %>% 
  pull(var)

CodePudding user response:

If there are NAs as well, then use rowSums with na.rm = TRUE

vec1 <- as.integer(rowSums(dat[3:4], na.rm = TRUE) >= 4)
  • Related