Home > other >  How to filter by a series of string combinations
How to filter by a series of string combinations

Time:08-24

Below is the sample data. I am trying to find a way to create a new data frame, test2, that will only contains rows where the matoccode does not end in "000" or "0000". Most of the items that I find online are about removing them. I need to detect (str_detect, I presume) and then filter them out.

  matoccode <- c(111000,111015,222000,330000,420000,541011,621088,725510,110000,221000)
  nchg <- c(124,100,254,1000,8,65,321,987,125,300)
  pchg <- c(.102,.364,.012,.358,.521,.301,.254,.204,.115,.752)


  test <- data.frame(matoccode,nchg,pchg)

CodePudding user response:

You could use str_detect, it will coerce the variable matoccode to a character. Then use a regex that looks for 3 or 4 zeroes at the end of the string.

library(stringr)
library(dplyr)

test %>% 
  filter(!str_detect(matoccode, "0{3,4}$"))

Although given that matoccode is numeric, you could also check whether it is divisible by 1000 or 10000 with no remainder:

test %>% 
  filter(!(matoccode %% 1000 == 0 | matoccode %% 10000 == 0))

Result in both cases:

  matoccode nchg  pchg
1    111015  100 0.364
2    541011   65 0.301
3    621088  321 0.254
4    725510  987 0.204

CodePudding user response:

If there is possiblity that 0000 or 000 could be found through the string then use substr() searching at the end.

library(dplyr)

test %>% filter(
  substr(matoccode, nchar(matoccode)-3, nchar(matoccode))!="0000" &
    substr(matoccode, nchar(matoccode)-2, nchar(matoccode))!="000"
  )

Returns

  matoccode nchg  pchg
1    111015  100 0.364
2    541011   65 0.301
3    621088  321 0.254
4    725510  987 0.204
  • Related