Home > Back-end >  Split ordered strings across two variables
Split ordered strings across two variables

Time:06-15

I am trying to use strsplit to split ordered strings across two variables into rows in a dataset. Each ordered string is separated by , , but I am getting a bit confused and haven't found any similar questions on SO.

Not sure if I am explaining myself right, so see below example data:

df <- data.frame(suburb = c("yellow, blue", "orange, yellow", "blue", "green, yellow"), postcode = c("a9,  b9", "c9, a9", "b9", "d9, a9"))

What i would ideally like to get to is something like

suburb postcode
yellow a9
blue   b9
orange c9
yellow a9
blue   b9
green  d9
yellow a9

CodePudding user response:

tidyr::separate_rows(df, suburb, postcode)
# # A tibble: 7 × 2
#   suburb postcode
#   <chr>  <chr>   
# 1 yellow a9      
# 2 blue   b9      
# 3 orange c9      
# 4 yellow a9      
# 5 blue   b9      
# 6 green  d9      
# 7 yellow a9  

CodePudding user response:

In base R, you can use strsplit and unlist then convert to dataframe:

cbind.data.frame(
    suburb = unlist(strsplit(df$suburb, ", ")), 
    postcode = unlist(strsplit(df$postcode, ", "))
)

CodePudding user response:

df <-
  data.frame(
    suburb = c("yellow, blue", "orange, yellow", "blue", "green, yellow"),
    postcode = c("a9,  b9", "c9, a9", "b9", "d9, a9")
  )

library(data.table)
setDT(df)[, lapply(.SD, function(x) unlist(strsplit(x, split = ",")))] 
#>     suburb postcode
#> 1:  yellow       a9
#> 2:    blue       b9
#> 3:  orange       c9
#> 4:  yellow       a9
#> 5:    blue       b9
#> 6:   green       d9
#> 7:  yellow       a9

Created on 2022-06-15 by the reprex package (v2.0.1)

CodePudding user response:

You can use separate_rows:

library(tidyr)
df %>%
  # split values into separate rows:
  separate_rows(c(suburb, postcode), sep = ",") %>%
  # clean up trailing and leading spaces:
  mutate(across(everything(), ~sub("\\s\\s?", "", .)))
# A tibble: 7 × 2
  suburb postcode
  <chr>  <chr>   
1 yellow a9      
2 blue   b9      
3 orange c9      
4 yellow a9      
5 blue   b9      
6 green  d9      
7 yellow a9

CodePudding user response:

Another possible solution:

library(tidyverse)

map_dfr(df, ~ str_split(.x, "\\s*,\\s*") %>%  unlist)

#> # A tibble: 7 × 2
#>   suburb postcode
#>   <chr>  <chr>   
#> 1 yellow a9      
#> 2 blue   b9      
#> 3 orange c9      
#> 4 yellow a9      
#> 5 blue   b9      
#> 6 green  d9      
#> 7 yellow a9
  • Related