I have a dataframe as follows:
df <- data.frame(v1 = 1:5, v2 = c('A, A, A', 'A', 'S', 'A, S', 'P, P, A'))
in column v2
, there are three letters (A, P, S), where they can appear in any combination, e.g. "A, A", "A, P", "P, P, S", "A", "A, A, S, A"
, etc.
What I want to do is to detect the rows that only ontain the letter "A", no matter how many times it is repeated. In my sample df, desired anseer is : TRUE, TRUE, FALSE, FALSE, FALSE
.
thanks in advance.
CodePudding user response:
I would use the regex pattern ^A(?:,\s*A)*$
:
df[grepl('^A(?:,\\s*A)*$', df$v2), ]
v1 v2
1 1 A, A, A
2 2 A
Data:
df <- data.frame(v1 = 1:5, v2 = c('A, A, A', 'A', 'S', 'A, S', 'P, P, A'))
CodePudding user response:
You can split the values into vectors in a list, and then check that all values in that vector are equal to A. You can do that with this line
sapply(strsplit(df$v2, ", "), function(x) all(x=="A"))
# [1] TRUE TRUE FALSE FALSE FALSE
CodePudding user response:
Using regex you can do -
grepl('^(A,?\\s?) $', df$v2)
[1] TRUE TRUE FALSE FALSE FALSE