Home > Software engineering >  how to gsub "/" and "\" from mutiple cols
how to gsub "/" and "\" from mutiple cols

Time:07-26

I have a df like this:

enter image description here

I want to clean it up by two method:

  1. gsub the subject 1-4 if it started with \ or / to ""; or
  2. change all / to \, and add \ to the one that is not start with \.

Is it a way to do this using mutate(across(everything(),...) or any other way?

I would like to know how to achieve both methods if it is possible. Thanks.

The ideal output will looks like this:

enter image description here

sample data:

df<- structure(list(ID = c("Tom", "Jerry"), Subject1 = c("/Art", "/ELA"
), Subject2 = c("\\Math", "/Math"), Subject3 = c("PE", "\\Bio\\2"
), Subject4 = c(NA, "\\Music\\1")), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response:

We may match any character that are not a letter from the start (^) and remove it with str_remove

library(dplyr)
library(stringr)
df %>%
     mutate(across(starts_with("Subject"), 
   ~ str_remove(.x, "^[^[:alpha:]] ")))

-output

# A tibble: 2 × 5
  ID    Subject1 Subject2 Subject3 Subject4  
  <chr> <chr>    <chr>    <chr>    <chr>     
1 Tom   Art      Math     "PE"      <NA>     
2 Jerry ELA      Math     "Bio\\2" "Music\\1"

Or escape the \\

df %>%
   mutate(across(starts_with("Subject"), ~ str_remove(.x, "^(/|\\\\)")))
# A tibble: 2 × 5
  ID    Subject1 Subject2 Subject3 Subject4  
  <chr> <chr>    <chr>    <chr>    <chr>     
1 Tom   Art      Math     "PE"      <NA>     
2 Jerry ELA      Math     "Bio\\2" "Music\\1"

or can also be

df %>%
   mutate(across(starts_with("Subject"), ~ str_remove(.x, "^[/\\\\]")))
# A tibble: 2 × 5
  ID    Subject1 Subject2 Subject3 Subject4  
  <chr> <chr>    <chr>    <chr>    <chr>     
1 Tom   Art      Math     "PE"      <NA>     
2 Jerry ELA      Math     "Bio\\2" "Music\\1"

For the second case, it would be to use str_replace

df %>%
  mutate(across(starts_with("Subject"), 
  ~ str_replace(str_replace(.x, "^\\\\", "/"), "^([A-Z])", "/\\1")))

# A tibble: 2 × 5
  ID    Subject1 Subject2 Subject3  Subject4   
  <chr> <chr>    <chr>    <chr>     <chr>      
1 Tom   /Art     /Math    "/PE"      <NA>      
2 Jerry /ELA     /Math    "/Bio\\2" "/Music\\1"
  •  Tags:  
  • r
  • Related