I have a dataset which I need to split the column into two based on the :
symbol. However, I do want to keep the :
in the first column. How to achieve that?
Here is the dataset:
dd <- data.frame(col1=c("*MOT:0 .",
"*CHI:byebye .",
"*MOT:yeah byebye .",
"*CHI:0 [>] .",
"*MOT:<what are you gonna do now> [<] ?",
"*CHI:gonna do .",
"*MOT:<what's that [= block]> [>] ?"))
dd
col1
*MOT:0 .
*CHI:byebye .
*MOT:yeah byebye .
*CHI:0 [>] .
*MOT:<what are you gonna do now> [<] ?
*CHI:gonna do .
*MOT:<what's that [= block]> [>] ?
In the end, I want this:
col1 col2
*MOT: 0 .
*CHI: byebye .
*MOT: yeah byebye .
*CHI: 0 [>] .
*MOT: <what are you gonna do now> [<] ?
*CHI: gonna do .
*MOT: <what's that [= block]> [>] ?
Any help will be greatly appreciated!
CodePudding user response:
You can use tidyr::separate
with a lookbehind regex
tidyr::separate(dd, col1, '(?<=:)', into = c('col1', 'col2'))
#> col1 col2
#> 1 *MOT: 0 .
#> 2 *CHI: byebye .
#> 3 *MOT: yeah byebye .
#> 4 *CHI: 0 [>] .
#> 5 *MOT: <what are you gonna do now> [<] ?
#> 6 *CHI: gonna do .
#> 7 *MOT: <what's that [= block]> [>] ?
CodePudding user response:
With extract
:
tidyr::extract(dd, col1, "(.*\\:)(.*)", into = c("col1", "col2"))
col1 col2
1 *MOT: 0 .
2 *CHI: byebye .
3 *MOT: yeah byebye .
4 *CHI: 0 [>] .
5 *MOT: <what are you gonna do now> [<] ?
6 *CHI: gonna do .
7 *MOT: <what's that [= block]> [>] ?
Note that extract
is superseded in favor of separate_wider_regex
:
separate_wider_regex(dd, col1, c(col1 = ".*\\:", col2 = ".*"))
Or in base R with strcapture
:
strcapture("(.*\\:)(.*)", dd$col1, proto = data.frame(col1 = "", col2 = ""))
CodePudding user response:
A base R approach using strsplit
with lapply
setNames(data.frame(do.call(rbind,
lapply(strsplit(dd$col1, ":"), function(x)
c(paste0(x[1], ":"), x[2])))), c("col1", "col2"))
col1 col2
1 *MOT: 0 .
2 *CHI: byebye .
3 *MOT: yeah byebye .
4 *CHI: 0 [>] .
5 *MOT: <what are you gonna do now> [<] ?
6 *CHI: gonna do .
7 *MOT: <what's that [= block]> [>] ?
CodePudding user response:
Using stringr::str_split
dplyr
library(dplyr)
stringr::str_split(dd$col1,"(?<=:)",simplify = T)%>%
as.data.frame() %>%
rename(col1=V1,
col2=V2)
col1 col2
1 *MOT: 0 .
2 *CHI: byebye .
3 *MOT: yeah byebye .
4 *CHI: 0 [>] .
5 *MOT: <what are you gonna do now> [<] ?
6 *CHI: gonna do .
7 *MOT: <what's that [= block]> [>] ?
CodePudding user response:
Using base R
with read.table
read.table(text = sub(":", ":,", dd$col1),
header = FALSE, sep = ",", col.names = c("col1", "col2"))
-output
col1 col2
1 *MOT: 0 .
2 *CHI: byebye .
3 *MOT: yeah byebye .
4 *CHI: 0 [>] .
5 *MOT: <what are you gonna do now> [<] ?
6 *CHI: gonna do .
7 *MOT: <whats that [= block]> [>] ?