I have a dataframe df
and want to remove everything including and after the third '-
' in the column 'case_id
':
df
case_id unit
TCGA-3A-01-03-9441 27
TCGA-9C-01-04-9641 15
TCGA-1E-01-05-9471 6
This is the desired output:
df
case_id unit
TCGA-3A-01 27
TCGA-9C-01 15
TCGA-1E-01 6
CodePudding user response:
We could use str_replace
library(stringr)
library(dplyr)
df1 %>%
mutate(case_id = str_replace(case_id, "^(([^-] -){2}[^-] )-.*", "\\1"))
-output
case_id unit
1 TCGA-3A-01 27
2 TCGA-9C-01 15
3 TCGA-1E-01 6
data
df1 <- structure(list(case_id = c("TCGA-3A-01-03-9441", "TCGA-9C-01-04-9641",
"TCGA-1E-01-05-9471"), unit = c(27L, 15L, 6L)),
class = "data.frame", row.names = c(NA,
-3L))