Home > database >  Replacing column names with another data frame if matches
Replacing column names with another data frame if matches

Time:04-25

Hi I am looking into figuring out how to match data frames together by column, then renaming it. If there is no name that matches, then I would want to drop that column instead.

For example, I would use this main dataset, call it DF1:

Name Reference Good Fair Bad Great Poor
George Hill 34 21 33 21 32
Frank Stairs 29 28 29 30 29
Bertha Trail 25 25 24 21 26

Then another DF, call this DF2, that allows me to replace the names of the columns of DF1

Name Adjusted_Name
Good good_run
Great very_great_work
Bad bad run
Fair fair run decent

Essentially, the words that would be substituted would not be any pattern of any sort, and I would try to match this first column in DF2 and match to DF1, and if there is a match in DF2$Name and DF(whatever column), then I would replace that name with the same row of DF2$Adjusted_Name. If there is no match, then the value in DF1 is dropped.

So the final goal would be to achieve:

Name Reference good_run fair run decent Bad run very_great_work
George Hill 34 21 33 21
Frank Stairs 29 28 29 30
Bertha Trail 25 25 24 21

In this case, "poor" was dropped because it didnt match the column name of DF1.

How should I go about this? How would I account if there thousands of columns? Does that change anything in how i Code? I am a bit new to R, and would appreciate any tips. Thank you!

CodePudding user response:

If you are open to a tidyverse solution, you could use

library(dplyr)
library(tibble)

df %>% 
  rename_with(~deframe(df2)[.x], .cols = df2$Name) %>% 
  select(Name, Reference, any_of(df2$Adjusted_Name))

This returns

# A tibble: 3 x 6
  Name   Reference good_run very_great_work bad_run fair_run_decent
  <chr>  <chr>        <dbl>           <dbl>   <dbl>           <dbl>
1 George Hill            34              21      33              21
2 Frank  Stairs          29              30      29              28
3 Bertha Trail           25              21      24              25

Data

df <- structure(list(Name = c("George", "Frank", "Bertha"), Reference = c("Hill", 
"Stairs", "Trail"), Good = c(34, 29, 25), Fair = c(21, 28, 25
), Bad = c(33, 29, 24), Great = c(21, 30, 21), Poor = c(32, 29, 
26)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L), spec = structure(list(cols = list(Name = structure(list(), class = c("collector_character", 
"collector")), Reference = structure(list(), class = c("collector_character", 
"collector")), Good = structure(list(), class = c("collector_double", 
"collector")), Fair = structure(list(), class = c("collector_double", 
"collector")), Bad = structure(list(), class = c("collector_double", 
"collector")), Great = structure(list(), class = c("collector_double", 
"collector")), Poor = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

df2 <- structure(list(Name = c("Good", "Great", "Bad", "Fair"), Adjusted_Name = c("good_run", 
"very_great_work", "bad_run", "fair_run_decent")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), spec = structure(list(
    cols = list(Name = structure(list(), class = c("collector_character", 
    "collector")), Adjusted_Name = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

CodePudding user response:

Try the following - using the list of adjusted names, you can grep the list of desired words against column names and subset the data frame on it:

Data

df <- read.table(header = TRUE, text = "Name    Reference   Good    Fair    Bad Great   Poor
                 George Hill    34  21  33  21  32
                 Frank  Stairs  29  28  29  30  29
                 Bertha Trail   25  25  24  21  26")

adj_name <- c("good_run","very_great_run","bad run","fair run decent")

Index the columns based on grep from the string of desired names (note the tolower() on the column names as well)

desired_words <- paste(unlist(strsplit(adj_name, "_| ")), collapse = "|")

df[,c(1:2,grep(desired_words, tolower(names(df))))]

Output

#    Name Reference Good Fair Bad Great
#1 George      Hill   34   21  33    21
#2  Frank    Stairs   29   28  29    30
#3 Bertha     Trail   25   25  24    21
  • Related