How to extract a string between () and inside another string-CodePudding

I use R for text mining, I want to count some strings in a data frame, they look like this in the text:

"conducteur(trice)" , "conducteur.trice"
"administratif(ve)" , "administratif.ve" , "administrati.ve"
"agent(e)"

My code is:

data <- data %>% 
  mutate(Description = tolower(Description),
         ve.count = str_count(Description, "[i].ve[ ]"), 
         e.count = str_count(Description, "(e)"), 
         trice.count = str_count(Description, "(trice)"))

I want to count the : .ve / (ve) / (ive) / .e / (e) / .trice / (trice)

My code can't detect what I want! Any help?

CodePudding user response：

Does this help?

library(tidyverse)

data <- tibble(
  Description = c(
    "conducteur(trice) or conducteur.trice",
    "administratif(ve) , administratif.ve or administrati.ve",
    "agent(e)"
  )
)

data %>%
  mutate(
    # count ve inside parenthesis
    ve.count = Description %>% str_extract("[(][^()] [)]") %>% str_count("ve")
  )
#> # A tibble: 3 × 2
#>   Description                                             ve.count
#>   <chr>                                                      <int>
#> 1 conducteur(trice) or conducteur.trice                          0
#> 2 administratif(ve) , administratif.ve or administrati.ve        1
#> 3 agent(e)                                                       0

^{Created on 2022-05-09 by the reprex package (v2.0.0)}

CodePudding user response：

@danlooo , i tried this "[\(\.] [\) \.\,]" and it worked for me.

so it gives :

<- data %>% 
  mutate(Description = tolower(Description),
         ve.count = str_count(Description, "[\\(\\.]ve[\\) \\.\\,]"), 
         e.count = str_count(Description, "[\\(\\.]e[\\) \\.\\,]"), 
         trice.count = str_count(Description, "[\\(\\.]trice[\\) \\.\\,]"))