I have a complex data frame that looks like df1
library(tidyverse)
df <- tibble(position=c(100,200,300),
correction=c("62M89S",
"8M1D55M88S",
"1S25M1P36M89S"))
df1 <- df %>%
separate(correction, into = str_c("col", 1:5),
sep = "(?<=\\D)(?=\\d)", fill = "left", remove = FALSE)
df1
#> # A tibble: 3 × 7
#> position correction col1 col2 col3 col4 col5
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 100 62M89S <NA> <NA> <NA> 62M 89S
#> 2 200 8M1D55M88S <NA> 8M 1D 55M 88S
#> 3 300 1S25M1P36M89S 1S 25M 1P 36M 89S
Created on 2022-03-02 by the reprex package (v2.0.1)
I want for every columns that starts_with("col")
to replace only the strings
that start with S, M, and D with "" [empty string] and the rest of the
and the rest with 0.
I want my data to look like this
df1
#> # A tibble: 3 × 7
#> position correction col1 col2 col3 col4 col5
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 100 62M89S <NA> <NA> <NA> 62 89
#> 2 200 8M1D55M88S <NA> 8 1 55 88
#> 3 300 1S25M1P36M89S 1 25 0 36 89
Notice here, that the cell that contains P
has been converted to zero.
this is a poor effort for which I am ashamed
df1 %>%
mutate(across(starts_with("col")),
~case_when(grepl("*M") | grepl("*S") | grepl("*D") ~ "",
TRUE ~ 0))
CodePudding user response:
Here is one possibility using case_when
and grepl
:
df1 %>%
mutate(
across(starts_with("col"),~case_when(
is.na(.) ~ NA_real_,
grepl("[SMD]$", .) ~ parse_number(.),
TRUE ~ 0
)
))
# A tibble: 3 x 7
position correction col1 col2 col3 col4 col5
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 100 62M89S NA NA NA 62 89
2 200 8M1D55M88S NA 8 1 55 88
3 300 1S25M1P36M89S 1 25 0 36 89
CodePudding user response:
df1 %>%
mutate_at(vars(starts_with('col')),
~ case_when(
grepl('[SMD]$', .x) ~ sub('[SMD]', '', .x),
grepl('P$' , .x) ~ '0',
TRUE ~ .x)
)