I have five dataframes (df1, df2, df3, df4, df5) including multiple columns that need to be renamed. In a separate dataframe, which functions as a map for renaming (see the table below), I have three columns one including new variable names and the other two including the variables that need to be renamed from. Old variables1 refer to df1,2 and 3 while old_variables2 refer to df4 and df5.
New variables | old variables1 | old variables2 |
---|---|---|
DPG | DPG1, DPG2, DPG3 | DPG1, DPG2 |
LMN | LMN1, LMN2, LMN3 | LMN1, LMN2 |
CLG | CLG1, CLG2, CLG3 | CLG1, CLG2 |
I need to write a code using the map data to rename the columns in my five dataframes according to the New variables column in the map data. For instance, DPG1, LMN1, and CLCG1 belong to df1 and they should be renamed to DPG, LMN, and CLG. And the same for df2 and df3. I try to use a generic code because the information might be updated in my data files. Can somebody give me some hints please? I don't know how to deal with multiple values separated by commas in each cell in old variables column.
Thanks to @starja I have come up with a solution but still I have problem with the left join part of the function because it is wrong to include by = c("original_variables" = "old_variables1","old_variables2").
cols<- c(old_variables1, old_variables2)
rename_function <- function(col_names, rename_data = rename_info) {
for (col in cols) {
rename_info_clean <- separate_rows(rename_info,all_of(col))
}
data.frame(original_variables = col_names) %>%
left_join(rename_info_clean , by = c("original_variables" = "old_variables1","old_variables2")) %>%
mutate(final_name = coalesce(new_variables, original_variables)) %>%
pull(final_name)
}
CodePudding user response:
We could also use some regex
to make a simpler function if the task is simply to remove the digits. Here using tidyverse
:
library(dplyr)
library(stringr)
test_data_1 |>
rename_with(~ str_extract(., "\\D "))
test_data_2 |>
rename_with(~ str_extract(., "\\D "))
Output:
DPG LMN test
1 1 4 a
DPG LMN other_name
1 2 5 4
Thanks to @starja for the data.
CodePudding user response:
Here is a tidyverse
solution that uses rename_with
:
library(dplyr)
rename_info <- data.frame(
new_variables = c("DPG", "LMN"),
old_variables = c("DPG1, DPG2, DPG3", "LMN1, LMN2, LMN3"),
old_variables_2 = c("DPG1, DPG2", "LMN1, LMN2")
)
test_data_1 <- data.frame(
DPG1 = 1,
LMN1 = 4,
test = "a"
)
test_data_2 <- data.frame(
DPG2 = 2,
LMN2 = 5,
other_name = 4
)
rename_function <- function(col_names, rename_data = rename_info) {
rename_info_clean_1 <- rename_info %>%
tidyr::separate_rows(
old_variables,
sep = ",[ ] "
) %>%
select(new_variables, old_variables)
rename_info_clean_2 <- rename_info %>%
tidyr::separate_rows(
old_variables_2,
sep = ",[ ] "
) %>%
select(new_variables, old_variables = old_variables_2)
rename_info_clean <- bind_rows(
rename_info_clean_1,
rename_info_clean_2
) %>%
distinct()
data.frame(original_variables = col_names) %>%
left_join(rename_info_clean, by = c("original_variables" = "old_variables")) %>%
mutate(final_name = coalesce(new_variables, original_variables)) %>%
pull(final_name)
}
test_data_1 %>%
rename_with(rename_function)
#> DPG LMN test
#> 1 1 4 a
test_data_2 %>%
rename_with(rename_function)
#> DPG LMN other_name
#> 1 2 5 4
Created on 2023-01-17 by the reprex package (v1.0.0)
Basically, rename_function
first makes a data.frame where each old/new variable name combination is in its own row with separate_rows
, an then it joins it with the available column names. When there is no match (e.g. for test
), the join leads to NA
which is replaced with the original name by coalesce
.
Edit
Now the separate_rows
step is repeated and the results for the 2 columns stacked together.