Home > Software engineering >  Trying to match pattern starting in a string of values R
Trying to match pattern starting in a string of values R

Time:11-19

My dataframe looks like this:

df
   BEN_ID Val_1 Val_2 Val_3 Val_4 Val_5     AGE GENDER
1     ID1 vA303     .     .     .     .      25      F
2     ID1  9351  A303 53019 49390 F5D12      52      F
3     ID2 541AZ  1120   462  4019 A36B0      58      M
4     ID2 30302  5939  2768  4019  2724      65      M
5     ID2 305A1 78652  9190  4019 33829      61      M
6     ID3 305A3 29590  5715     .     .      53      M
7     ID3 Z57B9 35981  5849   570  4254      35      M
8     ID3  5693 78900 30590 30500 Z25H2      19      M
9     ID3 7AD59  7881 30301 78900 78791      57      M
10    ID4 7AD59  5780 53530 30390  3051      57      F

I wanted to get rows that match with any of Val_1 to Val_5 starting as patterns of "303" or "305".

So my output should look like this:

   BEN_ID Val_1 Val_2 Val_3 Val_4 Val_5     AGE GENDER
4     ID2 30302  5939  2768  4019  2724      65      M
5     ID2 305A1 78652  9190  4019 33829      61      M
6     ID3 305A3 29590  5715     .     .      53      M
8     ID3  5693 78900 30590 30500 Z25H2      19      M
9     ID3 7AD59  7881 30301 78900 78791      57      M
10    ID4 7AD59  5780 53530 30390  3051      57      F

I tried this code

library(dplyr)
diag_cols = names(df %>% select(starts_with("Val")))

dat_read = dat_read %>% mutate(across(matches("Val"),as.character))

values = "303|3050"

subdf = df %>% filter(grepl(values,do.call(paste,c(df[,diag_cols],sep = ","))))

With this code Row1 is true as it has "va303" in Val_1.

I tried doing with taking values = "^303|^305" but that gives wrong output

TIA!

CodePudding user response:

A dplyr solution

library(dplyr)

df %>% 
  filter(if_any(starts_with("Val"), ~ grepl("^303|^305", .x)))
   BEN_ID Val_1 Val_2 Val_3 Val_4 Val_5 AGE GENDER
4     ID2 30302  5939  2768  4019  2724  65      M
5     ID2 305A1 78652  9190  4019 33829  61      M
6     ID3 305A3 29590  5715     .     .  53      M
8     ID3  5693 78900 30590 30500 Z25H2  19      M
9     ID3 7AD59  7881 30301 78900 78791  57      M
10    ID4 7AD59  5780 53530 30390  3051  57      F

CodePudding user response:

An R base approach:

df[apply(df[, -c(1,7,8)], 1, function(x) any(grepl("^303|^305", x))), ]
   BEN_ID Val_1 Val_2 Val_3 Val_4 Val_5 AGE GENDER
4     ID2 30302  5939  2768  4019  2724  65      M
5     ID2 305A1 78652  9190  4019 33829  61      M
6     ID3 305A3 29590  5715     .     .  53      M
8     ID3  5693 78900 30590 30500 Z25H2  19      M
9     ID3 7AD59  7881 30301 78900 78791  57      M
10    ID4 7AD59  5780 53530 30390  3051  57      F
  • Related