I inherited a data set where the data (individuals' sizes) is listed as comma separated values within cells, e.g.:
Date | Species1 | Species2 |
---|---|---|
073022 | 22,44 | 88,84,84 |
072822 | 35,67,19 | 78,73 |
What I need is to pull those sizes out while preserving the metadata, e.g.:
Date | Species | Size |
---|---|---|
073022 | Species1 | 22 |
073022 | Species1 | 44 |
073022 | Species2 | 88 |
073022 | Species2 | 84 |
and so on...
Any ideas on how to accomplish this?
I haven't tried much to solve this. I know the answer likely includes strsplit, unnest, and or separate but I am having trouble figuring out how to apply them.
CodePudding user response:
An option would use pivot, str_split, separate, and unnest from the tidyverse.
library(dplyr)
df <- tibble::tribble(
~date, ~species1, ~species2,
073022, "22,44", "88,84,84",
072822, "35,67,19", "78,73"
)
df %>%
tidyr::pivot_longer(c(species1, species2)) %>%
mutate(value = stringr::str_split(value, ",")) %>%
tidyr::unnest(value)
#> # A tibble: 10 × 3
#> date name value
#> <dbl> <chr> <chr>
#> 1 73022 species1 22
#> 2 73022 species1 44
#> 3 73022 species2 88
#> 4 73022 species2 84
#> 5 73022 species2 84
#> 6 72822 species1 35
#> 7 72822 species1 67
#> 8 72822 species1 19
#> 9 72822 species2 78
#> 10 72822 species2 73
Created on 2022-10-25 with reprex v2.0.2
CodePudding user response:
Here is a variation based on ejn88 script but using tidyr's separate_rows() function for more compact code.
library(dplyr)
library(tidyr)
df <- tibble::tribble(
~date, ~species1, ~species2,
073022, "22,44", "88,84,84",
072822, "35,67,19", "78,73"
)
df %>%
pivot_longer(c(species1, species2)) %>%
separate_rows(value, sep=",")