I have that:
res <- data.frame(id=c(1,2,3,4,5,6), text=c("(21) Nº do Pedido: BR 10 2016 015202 0 A2","(21) Nº do Pedido: (21) Nº do Pedido Anterior:BR 20 2016 011446 8 U2 BR 10 2016 011446 2","(21) Nº do Pedido: BR 10 2016 007903 9 A2","(21) Nº do Pedido: PI 1001284-2 A2","(21) Nº do Pedido: MU 8102871-7 U2","(21) Nº do Pedido: BR 10 2022 004466 0 (21) Nº do Pedido Anterior:BR 20 2016 011446 8 U2 BR 10 2016 011446 2"))
res %>% ifelse(
stringr::str_subset(text, "^(21) Nº do Pedido: (21) Nº do Pedido Anterior:.*"),
stringr::str_replace_all(text,".*:(.*)\\s{1,}[A-Z]", "") %>% stringr::str_trim( ),
stringr::str_replace_all(text,"^\(\\d{2,}\\). :\\s(. )\\(", "\\1") %>% stringr::str_trim( ))->res$text
expected output:
id text
1 1 BR 10 2016 015202 0 A2
2 2 BR 20 2016 011446 8 U2
3 3 BR 10 2016 007903 9 A2
4 4 PI 1001284-2 A2
5 5 MU 8102871-7 U2
6 6 BR 10 2022 004466 0
Any idea how to solve this?
CodePudding user response:
Since your text
column could contain more than 1 "BR", and you only want to output the first occurrence of it, I'll use a ifelse
to have two different regex to catch it.
library(stringr)
library(dplyr)
res %>%
mutate(text = ifelse(str_count(text, "BR") > 1,
gsub("^.*?(BR. (?= BR)).*$", "\\1", text, perl = T),
gsub("^.*?(BR. ).*$", "\\1", text, perl = T)))
id text
1 1 BR 10 2016 015202 0 A2
2 2 BR 20 2016 011446 8 U2
3 3 BR 10 2016 007903 9 A2
I've noticed that the part you would like to extract has a specific pattern, maybe we can catch that pattern using a single regex.
res %>%
mutate(text = gsub("^. ?([A-Z]{2}\\s[0-9]{2}\\s[0-9]{4}\\s[0-9]{6}\\s[0-9]{1}\\s[A-Z][0-9]).*?$", "\\1", text))
id text
1 1 BR 10 2016 015202 0 A2
2 2 BR 20 2016 011446 8 U2
3 3 BR 10 2016 007903 9 A2
CodePudding user response:
this is simple
just catch like this
\(21\) Nº do Pedido(?: Anterior)?: ?(.*)
or
(\(21\) Nº do Pedido(?: Anterior)?: ?(.*))