I have a column, Industry which looks like below, I would like to create a new column called Code which only contains the character that appear in the [...], THEN I wish to create a new column Industry_2 which contains all the characters but witHout the [...] so my results would look like below.
Industry | Code | Industry_2 |
---|---|---|
Total, all industries | Total, all industries | |
Agriculture [11] | [11] | Agriculture |
Manufacturing [31-33] | [31-33] | Manufacturing |
CodePudding user response:
Use tidyr::separate
with a look-ahead for the delimiter to keep the [
in the second string.
df <- data.frame(Industry = c("Total, all industries", "Agriculture [11]",
"Manufacturing [31-33]"))
library(tidyr)
separate(df, Industry,
sep = " (?=\\[)",
into = c("Industry_2", "Code"),
remove = FALSE)
output
Industry Industry_2 Code
1 Total, all industries Total, all industries <NA>
2 Agriculture [11] Agriculture [11]
3 Manufacturing [31-33] Manufacturing [31-33]
CodePudding user response:
If it is on the end of the line you could use gsub:
dataset$Code = gsub(".*\\[", "\\[", dataset$Industry)
dataset$Industry_2 = gsub("\\[.*", "", dataset$Industry)
CodePudding user response:
library(data.table)
df[c('code', "industry_2")] <- tstrsplit(df$Industry, " (?=\\[)", perl = TRUE) |> rev()
# Industry code industry_2
# 1 Total, all industries <NA> Total, all industries
# 2 Agriculture [11] [11] Agriculture
# 3 Manufacturing [31-33] [31-33] Manufacturing