I'm trying to write a function that will rename variables across multiple columns inside of a data table in R.
My data table is structured similar to this:
feature1 | feature2 | feature3 | feature4 |
---|---|---|---|
var_a | var_c | var_b | var_a |
var_b | var_a | var_a | var_c |
var_c | var_b | var_c | var_b |
I'm trying to rename all the variables to new name. Some of them are in feature1 for one item, but in feature4 for another item but the naming would be the same across the data frame.
feature1 | feature2 | feature3 | feature4 |
---|---|---|---|
new_a | new_c | new_b | new_a |
new_b | new_a | new_a | new_c |
new_c | new_b | new_c | new_b |
I'm just having trouble writing my own user-defined function to do this in less lines of code than a standard dat$feature1\[dat$feature1 == 'var_a'\] \*\<- '\*new_a'
.
Preferably I'd like to pass through something along the lines of function(dat, var_a, new_a)
or something where I can just pass through a list of my old and new variables.
Any help would be appreciated. Thank you!
CodePudding user response:
In base R:
df[] <- lapply(df, function(x) gsub("var","new", x))
Output:
# feature1 feature2 feature3 feature4
# 1 new_a new_c new_b new_a
# 2 new_b new_a new_a new_c
# 3 new_c new_b new_c new_b
Data
df <- read.table(text = "feature1 feature2 feature3 feature4
var_a var_c var_b var_a
var_b var_a var_a var_c
var_c var_b var_c var_b", header = TRUE)
df <- data.table::data.table(df)
CodePudding user response:
This is a function that takes in a data frame, the old string and its replacement.
library(tidyverse)
replace_func <- function(df, var, new_var) {
df %>%
mutate(across(everything(), ~ .x %>%
str_replace_all(var, new_var)))
}
replace_func(df, "var", "new")
# A tibble: 3 × 4
feature1 feature2 feature3 feature4
<chr> <chr> <chr> <chr>
1 new_a new_c new_b new_a
2 new_b new_a new_a new_c
3 new_c new_b new_c new_b
CodePudding user response:
df[, lapply(.SD, sub, pattern = "var", replacement = "new", fixed = TRUE)]
# feature1 feature2 feature3 feature4
# 1: new_a new_c new_b new_a
# 2: new_b new_a new_a new_c
# 3: new_c new_b new_c new_b
Using this sample data:
library(data.table)
df = fread(text = 'feature1 feature2 feature3 feature4
var_a var_c var_b var_a
var_b var_a var_a var_c
var_c var_b var_c var_b')
CodePudding user response:
Using dplyr
and across
library(dplyr)
df %>%
mutate(across(1:4, ~ sub(".*(_)", "new\\1", .x)))
feature1 feature2 feature3 feature4
1 new_a new_c new_b new_a
2 new_b new_a new_a new_c
3 new_c new_b new_c new_b