Delete the duplicate observations if a column contains particular text in R dataframe-CodePudding

I have an R dataframe that has duplicate rows as follows:

X1         X10
rs6908903  chr6
rs6908903  chr6_GL000251v2_alt
rs6908903  chr6_GL000252v2_alt
rs6908903  chr6_GL000252v2_alt
rs6908903  chr6_GL000252v2_alt

In this case, I want to create a new df containing only the first row (Chr6) and delete rows containing the char GL000

Thanks in advance

CodePudding user response：

Try grepl:

df[!grepl('GL000', df$X2),]

CodePudding user response：

Just to offer an alternative here is one using dplyr and stringr

library(dplyr)
library(stringr)
df <- data.frame(X1 =c("rs6908903","rs6908903","rs6908903","rs6908903","rs6908903"),X10=c("chr6","chr6_GL000251v2_alt","chr6_GL000252v2_alt","chr6_GL000252v2_alt","chr6_GL000252v2_alt"))

 df %>% filter(!str_detect(X10, 'GL000'))

Output:

         X1  X10
1 rs6908903 chr6

Edit:

df %>% 
  dplyr::filter(!grepl('_', X10))

Output:

         X1  X10
1 rs6908903 chr6