Replacing column names with another data frame if matches-CodePudding

Hi I am looking into figuring out how to match data frames together by column, then renaming it. If there is no name that matches, then I would want to drop that column instead.

For example, I would use this main dataset, call it DF1:

Name	Reference	Good	Fair	Bad	Great	Poor
George	Hill	34	21	33	21	32
Frank	Stairs	29	28	29	30	29
Bertha	Trail	25	25	24	21	26

Then another DF, call this DF2, that allows me to replace the names of the columns of DF1

Name	Adjusted_Name
Good	good_run
Great	very_great_work
Bad	bad run
Fair	fair run decent

Essentially, the words that would be substituted would not be any pattern of any sort, and I would try to match this first column in DF2 and match to DF1, and if there is a match in DF2$Name and DF(whatever column), then I would replace that name with the same row of DF2$Adjusted_Name. If there is no match, then the value in DF1 is dropped.

So the final goal would be to achieve:

Name	Reference	good_run	fair run decent	Bad run	very_great_work
George	Hill	34	21	33	21
Frank	Stairs	29	28	29	30
Bertha	Trail	25	25	24	21

In this case, "poor" was dropped because it didnt match the column name of DF1.

How should I go about this? How would I account if there thousands of columns? Does that change anything in how i Code? I am a bit new to R, and would appreciate any tips. Thank you!

CodePudding user response：

If you are open to a tidyverse solution, you could use

library(dplyr)
library(tibble)

df %>% 
  rename_with(~deframe(df2)[.x], .cols = df2$Name) %>% 
  select(Name, Reference, any_of(df2$Adjusted_Name))

This returns

# A tibble: 3 x 6
  Name   Reference good_run very_great_work bad_run fair_run_decent
  <chr>  <chr>        <dbl>           <dbl>   <dbl>           <dbl>
1 George Hill            34              21      33              21
2 Frank  Stairs          29              30      29              28
3 Bertha Trail           25              21      24              25

Data

df <- structure(list(Name = c("George", "Frank", "Bertha"), Reference = c("Hill", 
"Stairs", "Trail"), Good = c(34, 29, 25), Fair = c(21, 28, 25
), Bad = c(33, 29, 24), Great = c(21, 30, 21), Poor = c(32, 29, 
26)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L), spec = structure(list(cols = list(Name = structure(list(), class = c("collector_character", 
"collector")), Reference = structure(list(), class = c("collector_character", 
"collector")), Good = structure(list(), class = c("collector_double", 
"collector")), Fair = structure(list(), class = c("collector_double", 
"collector")), Bad = structure(list(), class = c("collector_double", 
"collector")), Great = structure(list(), class = c("collector_double", 
"collector")), Poor = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

df2 <- structure(list(Name = c("Good", "Great", "Bad", "Fair"), Adjusted_Name = c("good_run", 
"very_great_work", "bad_run", "fair_run_decent")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec = structure(list(
    cols = list(Name = structure(list(), class = c("collector_character", 
    "collector")), Adjusted_Name = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

CodePudding user response：

Try the following - using the list of adjusted names, you can grep the list of desired words against column names and subset the data frame on it:

Data

df <- read.table(header = TRUE, text = "Name    Reference   Good    Fair    Bad Great   Poor
                 George Hill    34  21  33  21  32
                 Frank  Stairs  29  28  29  30  29
                 Bertha Trail   25  25  24  21  26")

adj_name <- c("good_run","very_great_run","bad run","fair run decent")

Index the columns based on grep from the string of desired names (note the tolower() on the column names as well)

desired_words <- paste(unlist(strsplit(adj_name, "_| ")), collapse = "|")

df[,c(1:2,grep(desired_words, tolower(names(df))))]

Output

#    Name Reference Good Fair Bad Great
#1 George      Hill   34   21  33    21
#2  Frank    Stairs   29   28  29    30
#3 Bertha     Trail   25   25  24    21