Home > Back-end >  Mutate multiple dataframe columns where cell content depends on column name
Mutate multiple dataframe columns where cell content depends on column name

Time:10-11

I'm trying to replace binary information in dataframe columns with strings that refer to the columns' names.

My data looks like this (just with more natXY columns and some additional variables):

    df <- data.frame(id = c(1:5), natAB = c(1,0,0,0,1), natCD = c(0,1,0,0,0), natother = c(0,0,1,1,0), var1 = runif(5, 1, 10))
    df

All column names in question start with "nat", mostly followed by two letters although some contain a different number of characters.

For a single column, the following code achieves the desired outcome:

    df %>% mutate(natAB = ifelse(natAB == 1, "AB", NA)) -> df

Now I need to generalise this line in order to apply it to the other columns using the mutate() and across() functions.

I imagine something like this

    df %>% mutate(across(natAB:natother, ~ ifelse(
                  . == 1, paste(substr(colnames(.), start = 4, stop = nchar(colnames(.)))), NA))) -> df

... but end up with all my "nat" columns filled with NA. How do I reference the column name correctly in this code structure?

Any help is much appreciated.

CodePudding user response:

You can use cur_column to refer to the column name in an across call, and then use str_remove:

library(stringr)
library(dplyr)
df %>% 
  mutate(across(natAB:natother, 
                ~ ifelse(.x == 1, str_remove(cur_column(), "nat"), NA)))

#   id natAB natCD natother     var1
# 1  1    AB  <NA>     <NA> 7.646891
# 2  2  <NA>    CD     <NA> 4.704543
# 3  3  <NA>  <NA>    other 7.717925
# 4  4  <NA>  <NA>    other 3.367320
# 5  5    AB  <NA>     <NA> 8.455011
  • Related