I am trying to figure out what would be the best way to remove the parenthesis and everything inside of it using str_detect and regex. I've found responses that assist in removing parenthesis with text inside of it and numbers but I am having issues when it comes to detecting the $ and the possibility of having a comma in the value.
I have tried the following regex \([^()]*\) which seems to work in https://regex101.com/ but when I run it on the sample code there are no changes applied to the column.
Any pointers would be appreciated!
Sample Data:
Example <- data.frame(Column1 = c(
"Pineapple ($1,000)",
"($50,000) Roger",
"($1,000)",
"First ($100), Second ($1,000)"))
Output <- Example %>%
mutate(Column1 = str_replace(Column1, "\([^()]*\)", ""))
Managed to get an output using gsub but still wondering what would be the tidyverse approach.
Example$Column1 <- gsub("\\([^()]*\\)", "", Example$Column1)
CodePudding user response:
It's not clear to me how you want to deal with entries where you have more than one number. That aside and generally, a more convenient option is to use readr::parse_number
, rather than using stringr::str_detect
/stringr::str_remove
. parse_number
takes care of additional text, units and thousands separators.
If you want to keep only the first number (in the cases where there are more than one number per entry), you can do
library(tidyverse)
Example %>% mutate(Column1 = parse_number(Column1))
# Column1
#1 1000
#2 50000
#3 1000
#4 100
Or if you want to keep both/multiple numbers, I suggest using separate_rows
to separate entries based on a comma followed by a whitespace, before using readr::parse_number
.
Example %>%
separate_rows(Column1, sep = ",\\s") %>%
mutate(Column1 = parse_number(Column1))
## A tibble: 5 × 1
# Column1
# <dbl>
#1 1000
#2 50000
#3 1000
#4 100
#5 1000
Update
To separate keys and values, here is an option; please see inline comments for explanations:
library(tidyverse)
Example %>%
# Separate multiple comma-separated entries into rows
separate_rows(Column1, sep = ",\\s") %>%
# Swap "(value) key" > "key (value)" %>%
mutate(Column1 = str_replace(
Column1, "^(\\(. \\))\\s(\\w )$", "\\2 \\1")) %>%
# Separate "key (value)" into columns
separate(Column1, c("key", "value"), sep = "\\s", fill = "left") %>%
# Parse number
mutate(value = parse_number(value))
## A tibble: 5 × 2
# key value
# <chr> <dbl>
#1 Pineapple 1000
#2 Roger 50000
#3 NA 1000
#4 First 100
#5 Second 1000
Sample data
Example <- data.frame(Column1 = c(
"Pineapple ($1,000)",
"($50,000) Roger",
"($1,000)",
"First ($100), Second ($1,000)"))
CodePudding user response:
You can use str_replace_all
and add another \
when escaping \
in the regex.
library(tidyverse)
Example %>% mutate(Column1 = str_replace_all(Column1, "\\([^()]*\\)", ""))
# Column1
#1 Pineapple
#2 Roger
#3
#4 First , Second
CodePudding user response:
If I've got your requirements correctly, a very simple base-r gsub
will sort you out, including tidying up the stray spaces:
gsub(" ?\\([^()]*\\) ?", "", Example$Column1)
[1] "Pineapple" "Roger" "" "First, Second"
I'm not sure what you mean by a "tidyverse approach": these are just regular expressions, they are not specific to a package or even to R. If you prefer to use verbose wrappers you can use stringr::str_replace_all
with the same exact pattern.