Below is the sample data. I am trying to find a way to create a new data frame, test2, that will only contains rows where the matoccode does not end in "000" or "0000". Most of the items that I find online are about removing them. I need to detect (str_detect, I presume) and then filter them out.
matoccode <- c(111000,111015,222000,330000,420000,541011,621088,725510,110000,221000)
nchg <- c(124,100,254,1000,8,65,321,987,125,300)
pchg <- c(.102,.364,.012,.358,.521,.301,.254,.204,.115,.752)
test <- data.frame(matoccode,nchg,pchg)
CodePudding user response:
You could use str_detect
, it will coerce the variable matoccode
to a character. Then use a regex that looks for 3 or 4 zeroes at the end of the string.
library(stringr)
library(dplyr)
test %>%
filter(!str_detect(matoccode, "0{3,4}$"))
Although given that matoccode
is numeric, you could also check whether it is divisible by 1000 or 10000 with no remainder:
test %>%
filter(!(matoccode %% 1000 == 0 | matoccode %% 10000 == 0))
Result in both cases:
matoccode nchg pchg
1 111015 100 0.364
2 541011 65 0.301
3 621088 321 0.254
4 725510 987 0.204
CodePudding user response:
If there is possiblity that 0000 or 000 could be found through the string then use substr()
searching at the end.
library(dplyr)
test %>% filter(
substr(matoccode, nchar(matoccode)-3, nchar(matoccode))!="0000" &
substr(matoccode, nchar(matoccode)-2, nchar(matoccode))!="000"
)
Returns
matoccode nchg pchg
1 111015 100 0.364
2 541011 65 0.301
3 621088 321 0.254
4 725510 987 0.204