Say I have a dataset with two columns (type
and subtype
)
structure(list(Pathology = c("Bladder", "Bladder", "Bladder",
"Breast (ER- / PR- / HER2- / EGFR - / AR - / PD-L1 -)", "Breast (ER- / PR- / HER2- / AR - / EGFR -)",
"Breast (ER- / PR- / HER2- / BRCA- / PDL1 1% / FGFR -)", "Breast (ER- / PR- / HER2- / BRCA- / PDL1 2%)",
"Breast (ER- / PR- / HER2- / PDL1 - / AR -)", "Breast (ER- / PR- / HER2- / PD-L1 50% (Breast and IC 5% liver))",
"Breast (ER- / PR- / HER2-)"), simple_pathology = c("Breast ()",
"Breast ()", "Breast ()", "Breast (ER- / PR- / HER2-)", "Breast (ER- / PR- / HER2-)",
"Breast (ER- / PR- / HER2-)", "Breast (ER- / PR- / HER2-)", "Breast (ER- / PR- / HER2-)",
"Breast (ER- / PR- / HER2-)", "Breast (ER- / PR- / HER2-)")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
I want the type column to be the same, except when it starts with Breast
substring, in which case I want to substitute it by the subtype column.
So in the e.g. above, the output would be:
type subtype
Bladder Breast ()
Bladder Breast ()
Bladder Breast ()
Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
I've tried this, but it doesn't work.
if (subset(dat, startsWith(Pathology, "Breast"))) {
dat$Pathology <- dat$simple_pathology
} else {
dat$Pathology <- dat$Pathology
}
CodePudding user response:
Some simple subssetting replacement with the aid of stringr::str_starts()
require(stringr)
#> Loading required package: stringr
breast.index <- str_starts(dat[["Pathology"]], "Breast")
dat[["Pathology"]][breast.index ] <- dat[["simple_pathology"]][breast.index]
dat
#> # A tibble: 10 × 2
#> Pathology simple_pathology
#> <chr> <chr>
#> 1 Bladder Breast ()
#> 2 Bladder Breast ()
#> 3 Bladder Breast ()
#> 4 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#> 5 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#> 6 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#> 7 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#> 8 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#> 9 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#> 10 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
Created on 2022-05-27 by the reprex package (v2.0.1)
CodePudding user response:
You can modify in place using mutate
and if_else
as well:
library(tidyverse)
dat %>%
mutate(Pathology = if_else(str_starts(Pathology, 'Breast'), simple_pathology, Pathology))