Home > other >  In R, how would I tidy CSV data spread among multiple rows and columns into a three column format?
In R, how would I tidy CSV data spread among multiple rows and columns into a three column format?

Time:10-26

I inherited a data set where the data (individuals' sizes) is listed as comma separated values within cells, e.g.:

Date Species1 Species2
073022 22,44 88,84,84
072822 35,67,19 78,73

What I need is to pull those sizes out while preserving the metadata, e.g.:

Date Species Size
073022 Species1 22
073022 Species1 44
073022 Species2 88
073022 Species2 84

and so on...

Any ideas on how to accomplish this?

I haven't tried much to solve this. I know the answer likely includes strsplit, unnest, and or separate but I am having trouble figuring out how to apply them.

CodePudding user response:

An option would use pivot, str_split, separate, and unnest from the tidyverse.

library(dplyr)

df <- tibble::tribble(
  ~date, ~species1, ~species2,
  073022, "22,44", "88,84,84",
  072822, "35,67,19", "78,73"
)

df %>%
  tidyr::pivot_longer(c(species1, species2)) %>%
  mutate(value = stringr::str_split(value, ",")) %>%
  tidyr::unnest(value)
#> # A tibble: 10 × 3
#>     date name     value
#>    <dbl> <chr>    <chr>
#>  1 73022 species1 22   
#>  2 73022 species1 44   
#>  3 73022 species2 88   
#>  4 73022 species2 84   
#>  5 73022 species2 84   
#>  6 72822 species1 35   
#>  7 72822 species1 67   
#>  8 72822 species1 19   
#>  9 72822 species2 78   
#> 10 72822 species2 73

Created on 2022-10-25 with reprex v2.0.2

CodePudding user response:

Here is a variation based on ejn88 script but using tidyr's separate_rows() function for more compact code.

library(dplyr)
library(tidyr)

df <- tibble::tribble(
   ~date, ~species1, ~species2,
   073022, "22,44", "88,84,84",
   072822, "35,67,19", "78,73"
)

df %>%
   pivot_longer(c(species1, species2)) %>%
   separate_rows(value, sep=",")
  • Related