I'm trying to replace binary information in dataframe columns with strings that refer to the columns' names.
My data looks like this (just with more natXY columns and some additional variables):
df <- data.frame(id = c(1:5), natAB = c(1,0,0,0,1), natCD = c(0,1,0,0,0), natother = c(0,0,1,1,0), var1 = runif(5, 1, 10))
df
All column names in question start with "nat", mostly followed by two letters although some contain a different number of characters.
For a single column, the following code achieves the desired outcome:
df %>% mutate(natAB = ifelse(natAB == 1, "AB", NA)) -> df
Now I need to generalise this line in order to apply it to the other columns using the mutate()
and across()
functions.
I imagine something like this
df %>% mutate(across(natAB:natother, ~ ifelse(
. == 1, paste(substr(colnames(.), start = 4, stop = nchar(colnames(.)))), NA))) -> df
... but end up with all my "nat" columns filled with NA. How do I reference the column name correctly in this code structure?
Any help is much appreciated.
CodePudding user response:
You can use cur_column
to refer to the column name in an across
call, and then use str_remove
:
library(stringr)
library(dplyr)
df %>%
mutate(across(natAB:natother,
~ ifelse(.x == 1, str_remove(cur_column(), "nat"), NA)))
# id natAB natCD natother var1
# 1 1 AB <NA> <NA> 7.646891
# 2 2 <NA> CD <NA> 4.704543
# 3 3 <NA> <NA> other 7.717925
# 4 4 <NA> <NA> other 3.367320
# 5 5 AB <NA> <NA> 8.455011