Home > Software design >  Combine mutate case_when() for columns that start_with() to replace certain characters
Combine mutate case_when() for columns that start_with() to replace certain characters

Time:03-02

I have a complex data frame that looks like df1

library(tidyverse)

df <- tibble(position=c(100,200,300),
             correction=c("62M89S", 
                     "8M1D55M88S",
                     "1S25M1P36M89S"))

df1 <- df %>% 
  separate(correction, into = str_c("col", 1:5), 
           sep = "(?<=\\D)(?=\\d)", fill = "left", remove = FALSE)

df1
#> # A tibble: 3 × 7
#>   position correction    col1  col2  col3  col4  col5 
#>      <dbl> <chr>         <chr> <chr> <chr> <chr> <chr>
#> 1      100 62M89S        <NA>  <NA>  <NA>  62M   89S  
#> 2      200 8M1D55M88S    <NA>  8M    1D    55M   88S  
#> 3      300 1S25M1P36M89S 1S    25M   1P    36M   89S

Created on 2022-03-02 by the reprex package (v2.0.1)

I want for every columns that starts_with("col") to replace only the strings that start with S, M, and D with "" [empty string] and the rest of the and the rest with 0.

I want my data to look like this

df1
#> # A tibble: 3 × 7
#>   position correction    col1  col2  col3  col4  col5 
#>      <dbl> <chr>         <chr> <chr> <chr> <chr> <chr>
#> 1      100 62M89S        <NA>  <NA>  <NA>  62   89  
#> 2      200 8M1D55M88S    <NA>   8     1    55   88  
#> 3      300 1S25M1P36M89S  1    25     0    36    89

Notice here, that the cell that contains P has been converted to zero.

this is a poor effort for which I am ashamed

df1 %>% 
  mutate(across(starts_with("col")), 
                ~case_when(grepl("*M") | grepl("*S") | grepl("*D")   ~ "",
                           TRUE ~ 0))

CodePudding user response:

Here is one possibility using case_when and grepl:

df1 %>% 
  mutate(
    across(starts_with("col"),~case_when(
      is.na(.) ~ NA_real_,
      grepl("[SMD]$", .) ~ parse_number(.),
      TRUE ~ 0
    )
  ))

# A tibble: 3 x 7
  position correction     col1  col2  col3  col4  col5
     <dbl> <chr>         <dbl> <dbl> <dbl> <dbl> <dbl>
1      100 62M89S           NA    NA    NA    62    89
2      200 8M1D55M88S       NA     8     1    55    88
3      300 1S25M1P36M89S     1    25     0    36    89
    

CodePudding user response:

df1 %>% 
  mutate_at(vars(starts_with('col')), 
            ~ case_when(
                grepl('[SMD]$', .x) ~ sub('[SMD]', '', .x),
                grepl('P$'    , .x) ~ '0',
                TRUE                ~ .x)
  )
  • Related