Home > other >  Changing the column name based on a partial string or substring
Changing the column name based on a partial string or substring

Time:10-26

I have a data frame df. I can generate this data frame 5 times for 5 different variables. Let's say variables names are:

Apple  # apple_df
Mango  # mango_df
Banana # banana_df
Potato # potato_df
Tomato # tomato_df

Each time the data frame is generated one of the column names is quite large such as:

Apple - Growth Level Judgement    # Column name for apple_df
Mango - Growth Level Judgement    # Column name for mango_df
Banana - Growth Level Judgement   # Column name for banana_df
Potato - Growth Level Judgement   # Column name for potato_df
Tomato - Growth Level Judgement   # Column name for tomato_df

I want to change the above column names to just the word Growth across each of the files.

Is there a way to do it effectively across all data frames by using one common line of code (separately)?

I can use the complete name in each of the files separately but was wondering if we could have a generalised solution:

# For Apple data frame

# Update column name
setnames(apple_df, 
         old = c('Apple - Growth Level Judgement'), 
         new = c('Growth'))

If I use the following regex-based solution, it only replaces the part of the string name that is common across all data frames. Unfortunately, not the whole name.

gsub(x = names(apple_df), 
     pattern = "Growth Level Judgement$", replacement = "Growth")  

Related posts:

The following post is related but it strips the known part of the string Remove part of column name. In my case, I want to detect the occurrence of a column based on a partial string that stays the same across multiple datasets. But once the string is detected in the column name, I want to change the whole column name. The following posts may also be related but do not meet my needs r Remove parts of column name after certain characters or Rename column names according to pattern matching R

Any advice on this would be greatly appreciated. Thanks!

CodePudding user response:

Put the dataframes in a list and use lapply/map to change name of every dataframe. list2env to transfer those changes from the list to individual dataframes.

library(dplyr)
library(purrr)

list_df <- lst(Apple, Mango, Banana, Potato, Tomato)

list_df <- map(list_df, 
             ~.x %>% rename_with(~'Growth', matches('Growth Level Judgement')))

list2env(list_df, .GlobalEnv)

To run it on single dataframe you can do -

Apple %>% rename_with(~'Growth', matches('Growth Level Judgement')))

Or in base R -

names(Apple)[grep('Growth Level Judgement', names(Apple))] <- 'Growth'

CodePudding user response:

Use endsWith from base R

names(Apple)[endsWith(names(Apple), 'Growth Level Judgement')] <- 'Growth'

Based on the documentation ?endsWith, it could be faster

startsWith() is equivalent to but much faster than

substring(x, 1, nchar(prefix)) == prefix
or also

grepl("^", x)

CodePudding user response:

An alternate solution could be:

Apple %>% 
      rename_with(~'Growth', ends_with('Growth Level Judgement'))
  • Related