I am trying to use strsplit
to split ordered strings across two variables into rows in a dataset. Each ordered string is separated by ,
, but I am getting a bit confused and haven't found any similar questions on SO.
Not sure if I am explaining myself right, so see below example data:
df <- data.frame(suburb = c("yellow, blue", "orange, yellow", "blue", "green, yellow"), postcode = c("a9, b9", "c9, a9", "b9", "d9, a9"))
What i would ideally like to get to is something like
suburb postcode
yellow a9
blue b9
orange c9
yellow a9
blue b9
green d9
yellow a9
CodePudding user response:
tidyr::separate_rows(df, suburb, postcode)
# # A tibble: 7 × 2
# suburb postcode
# <chr> <chr>
# 1 yellow a9
# 2 blue b9
# 3 orange c9
# 4 yellow a9
# 5 blue b9
# 6 green d9
# 7 yellow a9
CodePudding user response:
In base R
, you can use strsplit
and unlist
then convert to dataframe:
cbind.data.frame(
suburb = unlist(strsplit(df$suburb, ", ")),
postcode = unlist(strsplit(df$postcode, ", "))
)
CodePudding user response:
df <-
data.frame(
suburb = c("yellow, blue", "orange, yellow", "blue", "green, yellow"),
postcode = c("a9, b9", "c9, a9", "b9", "d9, a9")
)
library(data.table)
setDT(df)[, lapply(.SD, function(x) unlist(strsplit(x, split = ",")))]
#> suburb postcode
#> 1: yellow a9
#> 2: blue b9
#> 3: orange c9
#> 4: yellow a9
#> 5: blue b9
#> 6: green d9
#> 7: yellow a9
Created on 2022-06-15 by the reprex package (v2.0.1)
CodePudding user response:
You can use separate_rows
:
library(tidyr)
df %>%
# split values into separate rows:
separate_rows(c(suburb, postcode), sep = ",") %>%
# clean up trailing and leading spaces:
mutate(across(everything(), ~sub("\\s\\s?", "", .)))
# A tibble: 7 × 2
suburb postcode
<chr> <chr>
1 yellow a9
2 blue b9
3 orange c9
4 yellow a9
5 blue b9
6 green d9
7 yellow a9
CodePudding user response:
Another possible solution:
library(tidyverse)
map_dfr(df, ~ str_split(.x, "\\s*,\\s*") %>% unlist)
#> # A tibble: 7 × 2
#> suburb postcode
#> <chr> <chr>
#> 1 yellow a9
#> 2 blue b9
#> 3 orange c9
#> 4 yellow a9
#> 5 blue b9
#> 6 green d9
#> 7 yellow a9