Im requiring to delete all the rows which have less than 6 underscores on its value
my data
id var1 var2
1 procedures_1_1_5___240 TRUE
2 procedures___6 TRUE
3 procedures_1_20_1___2130 TRUE
4 procedures_1_1___2 TRUE
my desired output
id var1 var2
1 procedures_1_1_5___240 TRUE
3 procedures_1_20_1___2130 TRUE
CodePudding user response:
In base R you could do:
subset(df1, nchar(gsub('[^_]', '', var1)) >= 6)
id var1 var2
1 1 procedures_1_1_5___240 TRUE
3 3 procedures_1_20_1___2130 TRUE
CodePudding user response:
We can use str_count
on the 'var1' to find the number of '_' and filter
rows where the count is greater than or equal to 6
library(dplyr)
library(stringr)
df1 %>%
filter(str_count(var1, "_") >= 6)
-output
id var1 var2
1 1 procedures_1_1_5___240 TRUE
2 3 procedures_1_20_1___2130 TRUE
data
df1 <- structure(list(id = 1:4, var1 = c("procedures_1_1_5___240",
"procedures___6",
"procedures_1_20_1___2130", "procedures_1_1___2"), var2 = c(TRUE,
TRUE, TRUE, TRUE)), class = "data.frame", row.names = c(NA, -4L
))
CodePudding user response:
Base R
: Here is an alternative using lengths
with regmatches
and gregexpr
from base R
:
df1[lengths(regmatches(df1$var1, gregexpr("_", df1$var1))) >=6,]
Or short form:
lengths(gregexpr('_', df1$var1))
Provided by Onyambu (many thanks)
id var1 var2
1 1 procedures_1_1_5___240 TRUE
2 3 procedures_1_20_1___2130 TRUE