Home > Software design >  How to filter a string variable for values starting with a letter
How to filter a string variable for values starting with a letter

Time:11-26

I have a messy character variable like:

df<-c("_oun_", "0000ff", "03815", "?3jhdb", "test", "1,000", "1.000")

and I would like to filter out all values that are not words. I thought a start would be to filter out all values not starting with a character.

How can I do this with tidyverse? For the above mentioned example, the desired output would be test.

CodePudding user response:

Some options with stringr. The regex finds anything that starts ^ with a letter [:alpha:] (upper or lower case) and is followed by any number of letters.

This prints the values directly without the need to manually subset the data:

str_subset(df, "^[:alpha:] ")
[1] "test"

With manual subsetting:

df[str_detect(df, "^[:alpha:] ")]
[1] "test"

or

df[str_which(df, "^[:alpha:] ")]
[1] "test"

Keeps the vector structure intact:

str_extract(df, "^[:alpha:] ")
[1] NA     NA     NA     NA     "test" NA     NA
  • Related