Extract string inside specific pattern [...] in R-CodePudding

I have a column, Industry which looks like below, I would like to create a new column called Code which only contains the character that appear in the [...], THEN I wish to create a new column Industry_2 which contains all the characters but witHout the [...] so my results would look like below.

Industry	Code	Industry_2
Total, all industries		Total, all industries
Agriculture [11]	[11]	Agriculture
Manufacturing [31-33]	[31-33]	Manufacturing

CodePudding user response：

Use tidyr::separate with a look-ahead for the delimiter to keep the [ in the second string.

df <- data.frame(Industry = c("Total, all industries", "Agriculture [11]",
                        "Manufacturing [31-33]"))

library(tidyr)
separate(df, Industry,
         sep = " (?=\\[)", 
         into = c("Industry_2", "Code"),
         remove = FALSE)

output

               Industry            Industry_2    Code
1 Total, all industries Total, all industries    <NA>
2      Agriculture [11]           Agriculture    [11]
3 Manufacturing [31-33]         Manufacturing [31-33]

CodePudding user response：

If it is on the end of the line you could use gsub:

 dataset$Code = gsub(".*\\[", "\\[", dataset$Industry)
 dataset$Industry_2 = gsub("\\[.*", "", dataset$Industry)

CodePudding user response：

library(data.table)
df[c('code', "industry_2")] <- tstrsplit(df$Industry, " (?=\\[)", perl = TRUE) |> rev()

#                Industry    code            industry_2
# 1 Total, all industries    <NA> Total, all industries
# 2      Agriculture [11]    [11]           Agriculture
# 3 Manufacturing [31-33] [31-33]         Manufacturing