Home > Enterprise >  Condition on multiple columns and change the values using R
Condition on multiple columns and change the values using R

Time:07-27

I simplified the dataset to demonstrate what I want to do. I'm not used to dealing with multiple columns. Here I made a simple data

data<-data.frame(id=c(1,1,1,2,2,2,2),
                 title_1=c(65,58,47,NA,25,27,43),
                 title_2=c(NA,NA,32,35,12,NA,1))

In my actual dataset there are so many columns, but for now I just named as above. My goal is to change the values of title_1 and title_2 by the following rule. If there is a number , change it to 1. If there is an NA value, change it to 0. But in my actual dataset, there are hundreds of columns named as title_1, title_2, ... , title_100, ... So, I cannot type all the column names. So for my simple data, I want to use the code that doesn't type the column names explicitly. My expected output is

data<-data.frame(id=c(1,1,1,2,2,2,2),
                       title_1=c(1,1,1,0,1,1,1),
                       title_2=c(0,0,1,1,1,0,1))

CodePudding user response:

With dplyr we can use tidyselect syntax inside across() to select all variables starting with "title_" and then apply a function on all selected columns inside across():

data<-data.frame(id=c(1,1,1,2,2,2,2),
                 title_1=c(65,58,47,NA,25,27,43),
                 title_2=c(NA,NA,32,35,12,NA,1))

library(dplyr)

data %>% 
  mutate(across(starts_with("title_"), ~ ifelse(is.na(.x), 0, 1)))
#>   id title_1 title_2
#> 1  1       1       0
#> 2  1       1       0
#> 3  1       1       1
#> 4  2       0       1
#> 5  2       1       1
#> 6  2       1       0
#> 7  2       1       1

In base R we would use grepl to select the column names, then assign those columns new values with lapply:

data<-data.frame(id=c(1,1,1,2,2,2,2),
                 title_1=c(65,58,47,NA,25,27,43),
                 title_2=c(NA,NA,32,35,12,NA,1))

mycols <- grepl("^title_", names(data))

data[mycols] <- lapply(data[mycols], \(x) ifelse(is.na(x), 0, 1))
data

#>   id title_1 title_2
#> 1  1       1       0
#> 2  1       1       0
#> 3  1       1       1
#> 4  2       0       1
#> 5  2       1       1
#> 6  2       1       0
#> 7  2       1       1

Finally, we would select the columns with data.table similary, but here we'd prefer the actual names with grep(value = TRUE):

mycols <- grep("^title_", names(data), value = TRUE)

library(data.table)

data_tb <- as.data.table(data)

data_tb[,
        get("mycols") := lapply(.SD, \(x) ifelse(is.na(x), 0, 1)),
       .SDcols = mycols]

data_tb
#>    id title_1 title_2
#> 1:  1       1       0
#> 2:  1       1       0
#> 3:  1       1       1
#> 4:  2       0       1
#> 5:  2       1       1
#> 6:  2       1       0
#> 7:  2       1       1

Created on 2022-07-26 by the reprex package (v2.0.1)

  • Related