copy everything after a string into a different column in R-CodePudding

I have some columns in a data frame that look like this:

df <- data.frame(act=c("DEC S/N, de 21/06/2006",
                        "DEC S/N, de 05/06/2006",
                         "DEC S/N, de 21/06/2006; MP 542, de 12/08/2011; LEI 12.678, de 25/06/2012"), adj=NA)

I would like to copy everything after the first ; (MP 542, de 12/08/2011; LEI 12.678, de 25/06/2012) in the column 'act', into the column 'adj'. Ideally, removing the space that would be left at the star of the cut-off string. All other cells, this is, where the strings in column 'act' do not have a ; should be left NA in column 'adj'.

CodePudding user response：

Here I'll use an ifelse statement to look for ";" by grepl(), then use some low-level regex to capture the strings after the first ";" into the act column.

library(dplyr)

df %>% mutate(adj = ifelse(grepl(";", act), 
                           gsub("^. ?(?<=;) (. ?)$", "\\1", act, perl = T), 
                           adj))

CodePudding user response：

Using str_match from stringr :

df <- data.frame(act=c("DEC S/N, de 21/06/2006",
                       "DEC S/N, de 05/06/2006",
                       "DEC S/N, de 21/06/2006; MP 542, de 12/08/2011; LEI 12.678, de 25/06/2012"), adj=NA)
df %>% mutate(adj = str_match(act, "[^;]*;(.*)")[,2])

CodePudding user response：

Using stringr::str_extract -

df$adj <- stringr::str_extract(df$act, '(?<=;\\s)(.*)')
df$adj
#[1] NA   NA    "MP 542, de 12/08/2011; LEI 12.678, de 25/06/2012"

CodePudding user response：

df %>%
  extract(act, 'adj', '; (.*)', remove = FALSE)

or even try:

df %>%
  separate(act, c('act1', 'adj'), '; ', 
           extra = 'merge', fill = 'right', remove = FALSE)