Home > OS >  How to delete rows based on character number of occurrences?
How to delete rows based on character number of occurrences?

Time:10-12

Im requiring to delete all the rows which have less than 6 underscores on its value

my data

id var1                            var2
1  procedures_1_1_5___240          TRUE
2  procedures___6                  TRUE
3  procedures_1_20_1___2130        TRUE
4  procedures_1_1___2              TRUE

my desired output

id var1                            var2
1  procedures_1_1_5___240          TRUE
3  procedures_1_20_1___2130        TRUE

CodePudding user response:

In base R you could do:

subset(df1, nchar(gsub('[^_]', '', var1)) >= 6)

  id                     var1 var2
1  1   procedures_1_1_5___240 TRUE
3  3 procedures_1_20_1___2130 TRUE

CodePudding user response:

We can use str_count on the 'var1' to find the number of '_' and filter rows where the count is greater than or equal to 6

library(dplyr)
library(stringr)
df1 %>% 
   filter(str_count(var1, "_") >= 6)

-output

  id                     var1 var2
1  1   procedures_1_1_5___240 TRUE
2  3 procedures_1_20_1___2130 TRUE

data

df1 <- structure(list(id = 1:4, var1 = c("procedures_1_1_5___240", 
"procedures___6", 
"procedures_1_20_1___2130", "procedures_1_1___2"), var2 = c(TRUE, 
TRUE, TRUE, TRUE)), class = "data.frame", row.names = c(NA, -4L
))

CodePudding user response:

Base R: Here is an alternative using lengths with regmatches and gregexpr from base R:

df1[lengths(regmatches(df1$var1, gregexpr("_", df1$var1))) >=6,]

Or short form:
lengths(gregexpr('_', df1$var1))
 Provided by Onyambu (many thanks)
  id                     var1 var2
1  1   procedures_1_1_5___240 TRUE
2  3 procedures_1_20_1___2130 TRUE
  •  Tags:  
  • r
  • Related