Home > Enterprise >  stringr::str_starts returns TRUE when it shouldn't
stringr::str_starts returns TRUE when it shouldn't

Time:03-11

I am trying to detect whether a string starts with either of the provided strings (separated by | )

name = "KKSWAP"
stringr::str_starts(name, "RTT|SWAP")

returns TRUE, but

str_starts(name, "SWAP|RTT")

returns FALSE

This behaviour seems wrong, as KKSWAP doesn't start with "RTT" or "SWAP". I would expect this to be false in both above cases.

CodePudding user response:

The reason can be found in the code of the function :

function (string, pattern, negate = FALSE) 
{
    switch(type(pattern), empty = , bound = stop("boundary() patterns are not supported."), 
        fixed = stri_startswith_fixed(string, pattern, negate = negate, 
            opts_fixed = opts(pattern)), coll = stri_startswith_coll(string, 
            pattern, negate = negate, opts_collator = opts(pattern)), 
        regex = {
            pattern2 <- paste0("^", pattern)
            attributes(pattern2) <- attributes(pattern)
            str_detect(string, pattern2, negate)
        })
}

You can see, it pastes '^' in front of the parttern, so in your example it looks for '^RR|SWAP' and finds 'SWAP'.

If you want to look at more than one pattern you should use a vector:

name <- "KKSWAP"
stringr::str_starts(name, c("RTT","SWAP"))
# [1] FALSE FALSE

If you want just one answer, you can combine with any()

name <- "KKSWAP"
stringr::str_starts(name, c("RTT","SWAP"))
# [1] FALSE

The advantage of stringr::str_starts() is the vectorisation of the pattern argument, but if you don't need it grepl('^RTT|^SWAP', name), as suggested by TTS, is a good base R alternative.

Alternatively, the base function startsWith() suggested by jpsmith offers both the vectorized and | options :

startsWith(name, c("RTT","SWAP"))
# [1] FALSE FALSE

startsWith(name, "RTT|SWAP")
# [1] FALSE

CodePudding user response:

I'm not familiar with the stringr version, but the base R version startsWith returns your desired result. If you don't have to use stringr, this may be a solution:

startsWith(name, "RTT|SWAP")
startsWith(name, "SWAP|RTT")
startsWith(name, "KK")

# > startsWith(name, "RTT|SWAP")
# [1] FALSE
# > startsWith(name, "SWAP|RTT")
# [1] FALSE
# > startsWith(name, "KK")
# [1] TRUE

CodePudding user response:

The help text describes str_starts: Detect the presence or absence of a pattern at the beginning or end of a string. This might be why it's not behaving quite as expected.

pattern is the Pattern with which the string starts or ends.

We can add ^ regex to make it search at the beginning of string and get the expected result.

name = 'KKSWAP'
str_starts(name, '^RTT|^SWAP')

I would prefer grepl in this instance because it seems less misleading.

grepl('^RTT|^SWAP', name)
  • Related