Home > OS >  How to separate multiple-character values inside row?
How to separate multiple-character values inside row?

Time:10-12

I have this dummy input dataframe. (Input dataframe cannot be changed)

data <- data.frame(c('c(a1, a10)'),
                   c('c(b1, b10)'),
                   stringsAsFactors = FALSE)

colnames(data) <- c('A', 'B')
head(data)
#           A          B
#1 c(a1, a10) c(b1, b10)

And I would like to turn this dataframe to below dataframe. Tidyverse, dplyr approach will be helpful.

#    A            B
#1  a1           b1
#2 a10          b10

CodePudding user response:

You can remove additional text around the character and use separate_rows.

library(dplyr)
library(tidyr)

data %>%
  mutate(across(.fns = ~gsub('c\\(|\\)', '', .))) %>%
  separate_rows(A:B, sep = ',\\s*')

#  A     B    
#  <chr> <chr>
#1 a1    b1   
#2 a10   b10  

CodePudding user response:

Use the tidyr library's function named separate_rows:

library(tidyr)
as.data.frame(t(apply(data, 1, function(x) gsub("c\\(|\\)", "", x)))) %>% 
  separate_rows(colnames(.), sep=',\\s*')

Output:

# A tibble: 2 x 2
  A     B    
  <chr> <chr>
1 a1    b1   
2 a10   b10

This only requires the tidyr library, and it doesn't need the dplyr package.

Explanation:

It applies on every a function with removes the c( in the beginning and ) at the end with regex in the function gsub. Then applies the tidyr::separate_rows function on all the column names (with colnames(.)), and specified the separator as a regex ,\\s*, which is a comma and a space.

CodePudding user response:

You can first str_extract the relevant strings using lookbehind and lookahead and then separate_rows:

library(tidyr)
library(stringr)
data %>%
  mutate(across(c(1:2), 
                ~str_extract_all(., "(?<=\\()[^,\\s] |[^,\\s] (?=\\))"))) %>% 
  separate_rows(., sep = ",\\s*") %>%
  unnest(cols = c(1:2))
# A tibble: 2 x 2
  A     B    
  <chr> <chr>
1 a1    b1   
2 a10   b10 
  • Related