Select the strings that only contain a specific letter-CodePudding

I have a dataframe as follows:

df <- data.frame(v1 = 1:5, v2 = c('A, A, A', 'A', 'S', 'A, S', 'P, P, A'))

in column v2, there are three letters (A, P, S), where they can appear in any combination, e.g. "A, A", "A, P", "P, P, S", "A", "A, A, S, A", etc.

What I want to do is to detect the rows that only ontain the letter "A", no matter how many times it is repeated. In my sample df, desired anseer is : TRUE, TRUE, FALSE, FALSE, FALSE.

thanks in advance.

CodePudding user response：

I would use the regex pattern ^A(?:,\s*A)*$:

df[grepl('^A(?:,\\s*A)*$', df$v2), ]

  v1      v2
1  1 A, A, A
2  2       A

Data:

df <- data.frame(v1 = 1:5, v2 = c('A, A, A', 'A', 'S', 'A, S', 'P, P, A'))

CodePudding user response：

You can split the values into vectors in a list, and then check that all values in that vector are equal to A. You can do that with this line

sapply(strsplit(df$v2, ", "), function(x) all(x=="A"))
# [1]  TRUE  TRUE FALSE FALSE FALSE

CodePudding user response：

Using regex you can do -

grepl('^(A,?\\s?) $', df$v2)
[1]  TRUE  TRUE FALSE FALSE FALSE