I have a column with values such as this:
structure(list(col1 = c(" | | | | | | | |", "| | | | | | | | | | | | | | |",
"| | | | | | | | | | | | | | | ", "stop|", "stop| | ",
"stop | go")), class = "data.frame", row.names = c(NA, -6L))
I want to be able to remove all iterations of |
when they show up consecutively, or if they show up as | |
or | | |
.
Currently, I'm trying to figure out all the iterations of the pipes, but they seem kind of random. I was wondering if there's a way to make sure my iterations cover the following instances:
- When there are more than one
|
consecutively - When there are more than one
|
consecutively with a number of spaces (e.g.,| |
or| | |
- When
|
is at the end of the line (e.g.,\\|$
I would, however, keep the pipe between stop | go
.
Here's the code that I'm working with right now, but it removes the pipe in stop | go
.
df$col1 <- gsub('[\\| ]{2,}|[\\|$]', '', df$col1)
I want to remove all the |
symbols except for the one in stop | go
.
CodePudding user response:
Maybe this works
trimws(trimws(gsub('(\\|\\s ){2,}', "", df$col1),
whitespace = "\\s \\|"), whitespace = "\\|")
-output
[1] "" "" "" "stop" "stop" "stop | go"
CodePudding user response:
You could do:
gsub('\\|\\s*\\||\\|\\s*$', '', df$col1)
#> [1] " " " "
#> [3] " " "stop"
#> [5] "stop " "stop | go"
And a simple trimws
if you don't want the spaces this leaves behind, as in akrun's answer:
trimws(gsub('\\|\\s*\\||\\|\\s*$', '', df$col1))
#> [1] "" "" "" "stop" "stop"
#> [6] "stop | go"
CodePudding user response:
Another regex strategy is to remove |
's not followed by space and word:
trimws(gsub("\\|(?!\\s\\w)", "", df$col1, perl = TRUE))
Output:
[1] "" "" "" "stop" "stop" "stop | go"