Home > Software design >  Underscore and character condition for levels selection in regex using R
Underscore and character condition for levels selection in regex using R

Time:11-14

I'd like to remove the levels with "O", but just only after the second underscore "_". In my example:

my.ds <- c("Novo_Oeste_CANTODOPINHE_2O","Novo_Oeste_CANTODOPINHE_30O",
"Novo_Oeste_CANTODOPINHE_32O","Novo_Oeste_CANTODOPINHE_33O",
"Novo_Oeste_CANTODOPINHE_34O","Novo_Oeste_CANTODOPINHE_35O",
"Novo_Oeste_CANTODOPINHE_30","Novo_Oeste_CANTODOPINHE_492",
"Novo_Oeste_CANTODOPINHE_493","Novo_Oeste_CANTODOPINHE_494")

My desirable output is:

sel.ds
[1] "Novo_Oeste_CANTODOPINHE_30" "Novo_Oeste_CANTODOPINHE_492" 
[2] "Novo_Oeste_CANTODOPINHE_493" "Novo_Oeste_CANTODOPINHE_494" 

Please, help me.

CodePudding user response:

Maybe something like this. Remove strings with a "O" after 3 sections terminated by a "_".

my.ds[!grepl("(.*_){3}. O", my.ds)]
#> [1] "Novo_Oeste_CANTODOPINHE_30"  "Novo_Oeste_CANTODOPINHE_492"
#> [3] "Novo_Oeste_CANTODOPINHE_493" "Novo_Oeste_CANTODOPINHE_494"
  • Related