check if a string contains same character in R-CodePudding

Countries <- c("AAAAAAA", sample(c("India","USA","UK","SSS"),20,replace=TRUE))
df1 <- data.frame(Countries)
df1

I want to check if a string contains same character

library(stringi)
df1$All_S<-stri_count_fixed(df1$Countries,"S")==nchar(df1$Countries)
df1

      Countries All_S
  1    AAAAAAA FALSE
  2         UK FALSE
  3      India FALSE
  4      India FALSE
  5        SSS  TRUE
  6         UK FALSE
  7        SSS  TRUE
  8        SSS  TRUE
  9      India FALSE
  10       SSS  TRUE
  11        UK FALSE
  12     India FALSE
  13       SSS  TRUE
  14       USA FALSE
  15        UK FALSE
  16        UK FALSE
  17       SSS  TRUE
  18       SSS  TRUE
  19     India FALSE
  20       USA FALSE
  21       USA FALSE

However this only check if string contains only "S". How can I change it to make it work for any string. In above example, this means first entry AAAAAAA will also be True

CodePudding user response：

another possibility:

sapply(strsplit(df1$Countries, ""), function(x) all(x == x[1]))

CodePudding user response：

You can try this

transform(
    df1,
    All_same = grepl("^(.)(\\1) $", Countries)
)

which gives something like

   Countries All_same
1    AAAAAAA     TRUE
2         UK    FALSE
3         UK    FALSE
4         UK    FALSE
5        USA    FALSE
6         UK    FALSE
7        USA    FALSE
8        USA    FALSE
9        USA    FALSE
10        UK    FALSE
11     India    FALSE
12       SSS     TRUE
13       USA    FALSE
14       USA    FALSE
15     India    FALSE
16       USA    FALSE
17        UK    FALSE
18       SSS     TRUE
19     India    FALSE
20        UK    FALSE
21        UK    FALSE

CodePudding user response：

I would use grepl here:

df1$All_S <- grepl("^S $", df1$Countries)

To do the above for any letter, then use:

df1$All_S <- grepl("^(\\w)\\1*$", df1$Countries)

CodePudding user response：

"I want to check if a string contains same character" -- this leaves open the possibility that

(i) the string contains not-same characters too ("AAB")
(ii) the same characters can, but need not, be adjacent (e.g., "ABA")
(iii) the string contains only same characters

The three conditions call for slightly different regex solutions:

solution (i):

df1 %>%
  mutate(Same = str_detect(Countries, "(.)\\1 "))

The three conditions call for different solutions:

solution (ii):

df1 %>%
  mutate(Same = str_detect(Countries, "(.).*\\1 "))

solution (iii):

df1 %>%
  mutate(Same = str_detect(Countries, "^(.)\\1 $"))