Home > Software engineering >  how to remove ONLY a specific group of characters from both names and values of dataframe in R
how to remove ONLY a specific group of characters from both names and values of dataframe in R

Time:07-26

assuming this is my df

df <- tibble(`a*`=c("_x__", "*y", "z -"),
             b=c("_x__", "*y", "z -"))
> df
# A tibble: 3 x 2
  `a*`  b    
  <chr> <chr>
1 _x__  _x__ 
2 *y    *y   
3 z -   z -  

I want to remove *, _ and characters from both column names and values if exist to get

# A tibble: 3 x 2
  a     b    
  <chr> <chr>
1 x     x    
2 y     y    
3 z-    z-  

so I am using gsub(), but it only removes the first character. in fact I am looking for a pretty way to achieve both these changes using dply r pipes. Any hint or idea is appreciated.

df %>%
  mutate_all(funs(gsub(c("_","[*]"," "),"",.))) 


names(df) <- str_remove_all("[*]")

CodePudding user response:

We can pass multiple characters to match within [] in str_remove or gsub. But, not a vector of patterns in gsub as pattern is not vectorized in gsub

library(dplyr)
library(stringr)
df <- df %>% 
   transmute(across(everything(), str_remove_all,
    pattern = "[*_ ]", .names = "{str_remove_all(.col, '[*_ ]')}"))

-output

df
# A tibble: 3 × 2
  a     b    
  <chr> <chr>
1 x     x    
2 y     y    
3 z-    z-   

CodePudding user response:

This does the names as well but is pretty similar to akrun's answer:

library(dplyr)

pattern = "\\*|\\ |_"
df  |>
    mutate(across(
        .fns = \(col) gsub(pattern, "", col)
    ))  |>
    setNames(gsub(pattern, "", names(df)))
# A tibble: 3 x 2
#   a     b        
#   <chr> <chr>
# 1 x     x
# 2 y     y
# 3 z-    z-
  • Related