I am processing strings in R which are supposed to contain zero or one pair of parentheses. If there are nested parentheses I need to delete the inner pair. Here is an example where I need to delete the parentheses around big bent nachos but not the other/outer parentheses.
test <- c(
"Record ID",
"What is the best food? (choice=Nachos)",
"What is the best food? (choice=Tacos (big bent nachos))",
"What is the best food? (choice=Chips with stuff)",
"Complete?"
)
I know I can kill all the parentheses with the stringr
package using str_remove_all()
:
test |>
stringr::str_remove_all(stringr::fixed(")")) |>
stringr::str_remove_all(stringr::fixed("("))
but I don't have the RegEx skills to pick the inner parentheses. I found a SO post that is close but it removes the outer parentheses and I cant untangle it to remove the inner.
CodePudding user response:
Here is a solution using gsub from base R. It is broken down into 2 steps for readability and debugging.
test <- c(
"Record ID",
"What is the best food? (choice=Nachos)",
"What is the best food? (choice=Tacos (big bent nachos))",
"What is the best food? (choice=Chips with stuff)",
"Complete?"
)
test <- gsub("(\\(.*)\\(", "\\1", test)
# ( \\(.* ) - first group starts with '(' then zero or more characters following that first '('
# \\( - middle part look of a another '('
# "\\1" replace the found group with the part from the first group
test <-gsub("\\)(.*\\))", "\\1", test)
#similer to first part
test
[1] "Record ID"
[2] "What is the best food? (choice=Nachos)"
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"
CodePudding user response:
Assuming there be at most one nested parentheses, we could use a gsub()
approach:
output <- gsub("\\(\\s*(.*?)\\s*\\(.*?\\)(.*?)\\s*\\)", "(\\1\\2)", test)
output
[1] "Record ID"
[2] "What is the best food? (choice=Nachos)"
[3] "What is the best food? (choice=Tacos)"
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"
Data:
test <- c(
"Record ID",
"What is the best food? (choice=Nachos)",
"What is the best food? (choice=Tacos (big bent nachos))",
"What is the best food? (choice=Chips with stuff)",
"Complete?"
)
CodePudding user response:
Here you go.
test |>
stringr::str_replace_all("(\\().*\\(", "\\1") |> # remove inner open brackets
stringr::str_remove_all("\\)(?=.*\\))") # remove inner closed brackets
[1] "Record ID"
[2] "What is the best food? (choice=Nachos)"
[3] "What is the best food? (big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"