I have this dummy input dataframe. (Input dataframe cannot be changed)
data <- data.frame(c('c(a1, a10)'),
c('c(b1, b10)'),
stringsAsFactors = FALSE)
colnames(data) <- c('A', 'B')
head(data)
# A B
#1 c(a1, a10) c(b1, b10)
And I would like to turn this dataframe to below dataframe. Tidyverse, dplyr approach will be helpful.
# A B
#1 a1 b1
#2 a10 b10
CodePudding user response:
You can remove additional text around the character and use separate_rows
.
library(dplyr)
library(tidyr)
data %>%
mutate(across(.fns = ~gsub('c\\(|\\)', '', .))) %>%
separate_rows(A:B, sep = ',\\s*')
# A B
# <chr> <chr>
#1 a1 b1
#2 a10 b10
CodePudding user response:
Use the tidyr
library's function named separate_rows
:
library(tidyr)
as.data.frame(t(apply(data, 1, function(x) gsub("c\\(|\\)", "", x)))) %>%
separate_rows(colnames(.), sep=',\\s*')
Output:
# A tibble: 2 x 2
A B
<chr> <chr>
1 a1 b1
2 a10 b10
This only requires the tidyr
library, and it doesn't need the dplyr
package.
Explanation:
It applies on every a function with removes the c(
in the beginning and )
at the end with regex in the function gsub
. Then applies the tidyr::separate_rows
function on all the column names (with colnames(.)
), and specified the separator as a regex ,\\s*
, which is a comma and a space.
CodePudding user response:
You can first str_extract
the relevant strings using lookbehind and lookahead and then separate_rows
:
library(tidyr)
library(stringr)
data %>%
mutate(across(c(1:2),
~str_extract_all(., "(?<=\\()[^,\\s] |[^,\\s] (?=\\))"))) %>%
separate_rows(., sep = ",\\s*") %>%
unnest(cols = c(1:2))
# A tibble: 2 x 2
A B
<chr> <chr>
1 a1 b1
2 a10 b10