I am trying to count the number of |
in a string. This is my code but it is giving the incorrect answer of 32 instead of 2? Why is this happening and how do I get a function that returns 2? Thanks!
> levels
[1] "Completely|Partially|Not at all"
> str_count(levels, '|')
[1] 32
Also how do I separate the string by the |
character? I would like the output to be a character vector of length 3: 'Completely', 'Partially', 'Not at all'.
CodePudding user response:
The |
is meaningful in regex as an "or"-like operator. Escape it with backslashes.
stringr::str_count("Completely|Partially|Not at all", "\\|")
# [1] 2
To show what |
is normally used for, let's count the occurrences of el
and al
:
stringr::str_count("Completely|Partially|Not at all", "al")
# [1] 2
stringr::str_count("Completely|Partially|Not at all", "el")
# [1] 1
stringr::str_count("Completely|Partially|Not at all", "el|al")
# [1] 3
To look for the literal |
symbol, it needs to be escaped.
To split
the str
ing by the |
symbol, we can use strsplit
(base R) or stringr::str_split
:
strsplit("Completely|Partially|Not at all", "\\|")
# [[1]]
# [1] "Completely" "Partially" "Not at all"
It's returned as a list, because the argument may be a vector. For instance, it might be more clear if we do
vec <- c("Completely|Partially|Not at all", "something|else")
strsplit(vec, "\\|")
# [[1]]
# [1] "Completely" "Partially" "Not at all"
# [[2]]
# [1] "something" "else"
CodePudding user response:
The pipe |
character is a regex metacharacter and needs to be escaped:
levels <- "Completely|Partially|Not at all"
str_count(levels, '\\|')
Another general trick you can use here is to compare the length of the input against the same with all pipes stripped:
nchar(levels) - nchar(gsub("|", "", levels, fixed=TRUE))
[1] 2
Addendum: Use strsplit
:
unlist(strsplit(levels, "\\|"))
[1] "Completely" "Partially" "Not at all"