Home > Software design >  Modified value of a row based on a given condition in R
Modified value of a row based on a given condition in R

Time:05-28

Say I have a dataset with two columns (type and subtype)

structure(list(Pathology = c("Bladder", "Bladder", "Bladder", 
"Breast (ER- / PR- / HER2-  / EGFR - / AR - / PD-L1 -)", "Breast (ER- / PR- / HER2- / AR - / EGFR -)", 
"Breast (ER- / PR- / HER2- / BRCA- / PDL1 1% / FGFR -)", "Breast (ER- / PR- / HER2- / BRCA- / PDL1 2%)", 
"Breast (ER- / PR- / HER2- / PDL1 - / AR -)", "Breast (ER- / PR- / HER2- / PD-L1 50% (Breast and IC 5% liver))", 
"Breast (ER- / PR- / HER2-)"), simple_pathology = c("Breast ()", 
"Breast ()", "Breast ()", "Breast (ER- / PR- / HER2-)", "Breast (ER- / PR- / HER2-)", 
"Breast (ER- / PR- / HER2-)", "Breast (ER- / PR- / HER2-)", "Breast (ER- / PR- / HER2-)", 
"Breast (ER- / PR- / HER2-)", "Breast (ER- / PR- / HER2-)")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

I want the type column to be the same, except when it starts with Breast substring, in which case I want to substitute it by the subtype column.

So in the e.g. above, the output would be:

type                                subtype
Bladder                             Breast ()           
Bladder                             Breast ()           
Bladder                             Breast ()           
Breast (ER- / PR- / HER2-)          Breast (ER- / PR- / HER2-)          
Breast (ER- / PR- / HER2-)          Breast (ER- / PR- / HER2-)          
Breast (ER- / PR- / HER2-)          Breast (ER- / PR- / HER2-)          
Breast (ER- / PR- / HER2-)          Breast (ER- / PR- / HER2-)          
Breast (ER- / PR- / HER2-)          Breast (ER- / PR- / HER2-)          
Breast (ER- / PR- / HER2-)          Breast (ER- / PR- / HER2-)          
Breast (ER- / PR- / HER2-)          Breast (ER- / PR- / HER2-)

I've tried this, but it doesn't work.

if (subset(dat, startsWith(Pathology, "Breast"))) {
  dat$Pathology <- dat$simple_pathology
} else {
  dat$Pathology <- dat$Pathology
}

CodePudding user response:

Some simple subssetting replacement with the aid of stringr::str_starts()

require(stringr)
#> Loading required package: stringr

breast.index <- str_starts(dat[["Pathology"]], "Breast")
dat[["Pathology"]][breast.index ] <- dat[["simple_pathology"]][breast.index]

dat
#> # A tibble: 10 × 2
#>    Pathology                  simple_pathology          
#>    <chr>                      <chr>                     
#>  1 Bladder                    Breast ()                 
#>  2 Bladder                    Breast ()                 
#>  3 Bladder                    Breast ()                 
#>  4 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#>  5 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#>  6 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#>  7 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#>  8 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#>  9 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)
#> 10 Breast (ER- / PR- / HER2-) Breast (ER- / PR- / HER2-)

Created on 2022-05-27 by the reprex package (v2.0.1)

CodePudding user response:

You can modify in place using mutate and if_else as well:

library(tidyverse)

dat %>%
  mutate(Pathology = if_else(str_starts(Pathology, 'Breast'), simple_pathology, Pathology))
  • Related