I simplified the dataset to demonstrate what I want to do. I'm not used to dealing with multiple columns. Here I made a simple data
data<-data.frame(id=c(1,1,1,2,2,2,2),
title_1=c(65,58,47,NA,25,27,43),
title_2=c(NA,NA,32,35,12,NA,1))
In my actual dataset there are so many columns, but for now I just named as above. My goal is to change the values of title_1
and title_2
by the following rule. If there is a number , change it to 1. If there is an NA value, change it to 0. But in my actual dataset, there are hundreds of columns named as title_1, title_2, ... , title_100, ...
So, I cannot type all the column names. So for my simple data
, I want to use the code that doesn't type the column names explicitly. My expected output is
data<-data.frame(id=c(1,1,1,2,2,2,2),
title_1=c(1,1,1,0,1,1,1),
title_2=c(0,0,1,1,1,0,1))
CodePudding user response:
With dplyr we can use tidyselect syntax inside across()
to select all variables starting with "title_"
and then apply a function on all selected columns inside across()
:
data<-data.frame(id=c(1,1,1,2,2,2,2),
title_1=c(65,58,47,NA,25,27,43),
title_2=c(NA,NA,32,35,12,NA,1))
library(dplyr)
data %>%
mutate(across(starts_with("title_"), ~ ifelse(is.na(.x), 0, 1)))
#> id title_1 title_2
#> 1 1 1 0
#> 2 1 1 0
#> 3 1 1 1
#> 4 2 0 1
#> 5 2 1 1
#> 6 2 1 0
#> 7 2 1 1
In base R we would use grepl
to select the column names, then assign those columns new values with lapply
:
data<-data.frame(id=c(1,1,1,2,2,2,2),
title_1=c(65,58,47,NA,25,27,43),
title_2=c(NA,NA,32,35,12,NA,1))
mycols <- grepl("^title_", names(data))
data[mycols] <- lapply(data[mycols], \(x) ifelse(is.na(x), 0, 1))
data
#> id title_1 title_2
#> 1 1 1 0
#> 2 1 1 0
#> 3 1 1 1
#> 4 2 0 1
#> 5 2 1 1
#> 6 2 1 0
#> 7 2 1 1
Finally, we would select the columns with data.table
similary, but here we'd prefer the actual names with grep(value = TRUE)
:
mycols <- grep("^title_", names(data), value = TRUE)
library(data.table)
data_tb <- as.data.table(data)
data_tb[,
get("mycols") := lapply(.SD, \(x) ifelse(is.na(x), 0, 1)),
.SDcols = mycols]
data_tb
#> id title_1 title_2
#> 1: 1 1 0
#> 2: 1 1 0
#> 3: 1 1 1
#> 4: 2 0 1
#> 5: 2 1 1
#> 6: 2 1 0
#> 7: 2 1 1
Created on 2022-07-26 by the reprex package (v2.0.1)