Home > database >  How to count number of particular pattern in a string in R?
How to count number of particular pattern in a string in R?

Time:04-02

I am trying to count the number of | in a string. This is my code but it is giving the incorrect answer of 32 instead of 2? Why is this happening and how do I get a function that returns 2? Thanks!

> levels
[1] "Completely|Partially|Not at all"
> str_count(levels, '|')
[1] 32

Also how do I separate the string by the | character? I would like the output to be a character vector of length 3: 'Completely', 'Partially', 'Not at all'.

CodePudding user response:

The | is meaningful in regex as an "or"-like operator. Escape it with backslashes.

stringr::str_count("Completely|Partially|Not at all", "\\|")
# [1] 2

To show what | is normally used for, let's count the occurrences of el and al:

stringr::str_count("Completely|Partially|Not at all", "al")
# [1] 2
stringr::str_count("Completely|Partially|Not at all", "el")
# [1] 1
stringr::str_count("Completely|Partially|Not at all", "el|al")
# [1] 3

To look for the literal | symbol, it needs to be escaped.

To split the string by the | symbol, we can use strsplit (base R) or stringr::str_split:

strsplit("Completely|Partially|Not at all", "\\|")
# [[1]]
# [1] "Completely" "Partially"  "Not at all"

It's returned as a list, because the argument may be a vector. For instance, it might be more clear if we do

vec <- c("Completely|Partially|Not at all", "something|else")
strsplit(vec, "\\|")
# [[1]]
# [1] "Completely" "Partially"  "Not at all"
# [[2]]
# [1] "something" "else"     

CodePudding user response:

The pipe | character is a regex metacharacter and needs to be escaped:

levels <- "Completely|Partially|Not at all"
str_count(levels, '\\|')

Another general trick you can use here is to compare the length of the input against the same with all pipes stripped:

nchar(levels) - nchar(gsub("|", "", levels, fixed=TRUE))
[1] 2

Addendum: Use strsplit:

unlist(strsplit(levels, "\\|"))

[1] "Completely" "Partially"  "Not at all"
  • Related