I have an R dataframe that has duplicate rows as follows:
X1 X10
rs6908903 chr6
rs6908903 chr6_GL000251v2_alt
rs6908903 chr6_GL000252v2_alt
rs6908903 chr6_GL000252v2_alt
rs6908903 chr6_GL000252v2_alt
In this case, I want to create a new df containing only the first row (Chr6) and delete rows containing the char GL000
Thanks in advance
CodePudding user response:
Try grepl
:
df[!grepl('GL000', df$X2),]
CodePudding user response:
Just to offer an alternative here is one using dplyr
and stringr
library(dplyr)
library(stringr)
df <- data.frame(X1 =c("rs6908903","rs6908903","rs6908903","rs6908903","rs6908903"),X10=c("chr6","chr6_GL000251v2_alt","chr6_GL000252v2_alt","chr6_GL000252v2_alt","chr6_GL000252v2_alt"))
df %>% filter(!str_detect(X10, 'GL000'))
Output:
X1 X10
1 rs6908903 chr6
Edit:
df %>%
dplyr::filter(!grepl('_', X10))
Output:
X1 X10
1 rs6908903 chr6