How can I split words before and after parenthesis in R?-CodePudding

I'm trying to split a text variable that goes like this:

text = "name name name (1235-23-532)"

to something like this:

name = "name name name"
num = "1235-23-532"

I'm trying this code:

df_split <- df %>%
  separate(owners, 
       into = c("name", "num"), 
       sep = "(?<=[A-Za-z])(?=\\()"
  )

However, it results in the number counterpart being NA. I'm confused how it doesn't detect parenthesis (I tried both ( and \( and it doesn't work either way). Is there a good solution for this?

Also: there are some rows that has two parentheses pairs like: "name name name (name) (number)" - any good way to extract just the numbers?

Thank you very much.

CodePudding user response：

Here is one way how to get your desired output:

library(tidyverse)

as_tibble(text) %>% 
  mutate(name = str_trim(gsub("[^a-zA-Z]", " ", value)),
         num = str_extract(value, '\\d \\-\\d \\-\\d '), .keep="unused")

# A tibble: 1 x 2
  name           num        
  <chr>          <chr>      
1 name name name 1235-23-532

OR:

library(tidyverse)

as_tibble(text) %>% 
  separate(value, c("name", "num"), sep = ' \\(') %>% 
  mutate(num = str_remove(num, '\\)'))

CodePudding user response：

I don't have a way to prevent the "NA", but I do have a workaround I use when I have this problem. I use the mutate fct_recode function to the "NA" to equal the proper variable name (reference).

For example

%>% mutate(Column_Name = fct_recode(Column_Name, "new_name" = "NA"))

This works for me, it's not perfect but it fixes the problem.