Home > Net >  Split column data to create new rows
Split column data to create new rows

Time:04-21

I have a large dataset of equivalent structure to the below.

input.data <- data.frame(Continent = rep("Asia", 5), 
                   Country = rep("India", 5), 
                   Country.lat = rep(20.5937, 5),
                   Country.long = rep(78.9629, 5),
                   State = rep("Punjab", 5),
                   Region = rep("Ludhiana", 5),
                   Animal.group = rep("Mammal", 5),
                   Species.guild = c("Carnivore", "Omnivore\nOmnivore", "Omnivore\nOmnivore\nCarnivore", "Omnivore\nOmnivore\nCarnivore", "Herbivore\nOmnivore\nCarnivore\nCarnivore"),
                   Family = c("Canidae", "Muridae\nMuridae", "Muridae\nMuridae\nCanidae", "Muridae\nMuridae\nCanidae", "Leporidae\nMuridae\nCanidae\nCanidae"),
                   Scientific.name = c("Canis familiaris", "Mus musculus\nRattus rattus", "Mus musculus\nRattus rattus\nCanis familiaris", "Mus musculus\nRattus rattus\nCanis familiaris", "Oryctolagus cuniculus\nRattus rattus\nCanis familiaris\nCanis familiaris"),
                   Common.name = c("Dog", "House mouse\nBlack rat", "House mouse\nBlack rat\nDog", "House mouse\nBlack rat\nDog", "European rabbit\nBlack rat\nDog\nDog"),
                   Num.species = c(1, 2, 3, 3, 4),
                   Bait.state = rep("Solid", 5),
                   Bait.deliv.broad = rep("Toxin", 5),
                   Bait.deliv.specific = "1080\n1080\n1080\n1080\n1080")

input.data has 5 observations, each of which represents a separate experiment. However, some (but not all) experiments have focused on more than one species and hence have multiple entries in a single cell. This makes it very difficult to summarise species related information, including input.data$Species.guild, input.data$Family, input.data$Scientific.name and input.data$Common.name. To be able to summarise this data I need to expend input.data so that the new data frame (output.data) includes only a single piece of information per cell but otherwise all other associated information on the same row remains the same.

Note that where multiple pieces of info exist in a single cell they are separated by a carriage return \n

The resulting data frame should look like the below (rows don't need to be in same order necessarily).

Assistance much appreciated

output.data <- data.frame(Continent = rep("Asia", 13), 
                         Country = rep("India", 13), 
                         Country.lat = rep(20.5937, 13),
                         Country.long = rep(78.9629, 13),
                         State = rep("Punjab", 13),
                         Region = rep("Ludhiana", 13),
                         Animal.group = rep("Mammal", 13),
                         Species.guild = c("Carnivore", "Omnivore", "Omnivore", "Omnivore", "Omnivore", "Carnivore", "Omnivore", "Omnivore", "Carnivore", "Herbivore", "Omnivore", "Carnivore", "Carnivore"),
                         Family = c("Canidae", "Muridae", "Muridae", "Muridae", "Muridae", "Canidae", "Muridae", "Muridae", "Canidae", "Leporidae", "Muridae", "Canidae", "Canidae"),
                         Scientific.name = c("Canis familiaris", "Mus musculus", "Rattus rattus", "Mus musculus", "Rattus rattus", "Canis familiaris", "Mus musculus", "Rattus rattus", "Canis familiaris", "Oryctolagus cuniculus", "Rattus rattus", "Canis familiaris", "Canis familiaris"),
                         Common.name = c("Dog", "House mouse", "Black rat", "House mouse", "Black rat", "Dog", "House mouse", "Black rat", "Dog", "European rabbit", "Black rat", "Dog", "Dog"),
                         Num.species = rep(1, 13),
                         Bait.state = rep("Solid", 13),
                         Bait.deliv.broad = rep("Toxin", 13),
                         Bait.deliv.specific = c("1080", "1080", "1080", "1080", "1080", "1080", "1080", "1080", "1080", "1080", "1080", "1080", "1080"))

CodePudding user response:

input.data %>%
 separate_rows(Species.guild, Family, Scientific.name, Common.name, sep = '\n')

# A tibble: 13 x 15
   Continent Country Country.lat Country.long State  Region   Animal.group
   <chr>     <chr>         <dbl>        <dbl> <chr>  <chr>    <chr>       
 1 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
 2 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
 3 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
 4 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
 5 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
 6 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
 7 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
 8 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
 9 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
10 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
11 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
12 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
13 Asia      India          20.6         79.0 Punjab Ludhiana Mammal      
#..with 8 more variables: Species.guild <chr>, Family <chr>,
# Scientific.name <chr>, Common.name <chr>, Num.species <dbl>, Bait.state <chr>,
# Bait.deliv.broad <chr>, Bait.deliv.specific <chr>

CodePudding user response:

Since the last column has additional values that make the separation uneven, you could use splitstackshape::cSplit and zoo::na.locf.

library(splitstackshape)
library(zoo)

  df <- cSplit(input.data, c(8:11, 15), "long", sep = "\n", drop = TRUE)
  na.locf(df[rowSums(is.na(df)) != ncol(df),])

Output

   Continent Country Country.lat Country.long  State   Region Animal.group Species.guild    Family       Scientific.name     Common.name Num.species Bait.state Bait.deliv.broad Bait.deliv.specific
 1:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           1      Solid            Toxin                1080
 2:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           1      Solid            Toxin                1080
 3:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           1      Solid            Toxin                1080
 4:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           1      Solid            Toxin                1080
 5:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           1      Solid            Toxin                1080
 6:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal      Omnivore   Muridae          Mus musculus     House mouse           2      Solid            Toxin                1080
 7:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal      Omnivore   Muridae         Rattus rattus       Black rat           2      Solid            Toxin                1080
 8:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal      Omnivore   Muridae         Rattus rattus       Black rat           2      Solid            Toxin                1080
 9:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal      Omnivore   Muridae         Rattus rattus       Black rat           2      Solid            Toxin                1080
10:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal      Omnivore   Muridae         Rattus rattus       Black rat           2      Solid            Toxin                1080
11:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal      Omnivore   Muridae          Mus musculus     House mouse           3      Solid            Toxin                1080
12:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal      Omnivore   Muridae         Rattus rattus       Black rat           3      Solid            Toxin                1080
13:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           3      Solid            Toxin                1080
14:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           3      Solid            Toxin                1080
15:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           3      Solid            Toxin                1080
16:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal      Omnivore   Muridae          Mus musculus     House mouse           3      Solid            Toxin                1080
17:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal      Omnivore   Muridae         Rattus rattus       Black rat           3      Solid            Toxin                1080
18:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           3      Solid            Toxin                1080
19:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           3      Solid            Toxin                1080
20:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           3      Solid            Toxin                1080
21:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Herbivore Leporidae Oryctolagus cuniculus European rabbit           4      Solid            Toxin                1080
22:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal      Omnivore   Muridae         Rattus rattus       Black rat           4      Solid            Toxin                1080
23:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           4      Solid            Toxin                1080
24:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           4      Solid            Toxin                1080
25:      Asia   India     20.5937      78.9629 Punjab Ludhiana       Mammal     Carnivore   Canidae      Canis familiaris             Dog           4      Solid            Toxin                1080

Alternatively, if you only need the same amount of values, as is represented in other columns, then as mentioned in the comments you can use separate_rows from tidyverse. But we can trim the amount of values in Bait.deliv.specific according to another column like Species.guild, then we can separate all the needed columns at the same time.

library(tidyverse)

input.data %>% 
  mutate(x = str_count(Species.guild, "\n")) %>% 
  rowwise %>% 
  mutate(Bait.deliv.specific = paste0(strsplit(Bait.deliv.specific, "\n")[[1]][1:(x 1)], collapse = "\n")) %>% 
  ungroup %>% 
  separate_rows(., c(8:11, 15), sep = "\n") 

Output

# A tibble: 13 × 15
   Continent Country Country.lat Country.long State  Region   Animal.group Species.guild Family    Scientific.name       Common.name     Num.species Bait.state Bait.deliv.broad Bait.deliv.specific
   <chr>     <chr>         <dbl>        <dbl> <chr>  <chr>    <chr>        <chr>         <chr>     <chr>                 <chr>                 <dbl> <chr>      <chr>            <chr>              
 1 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Carnivore     Canidae   Canis familiaris      Dog                       1 Solid      Toxin            1080               
 2 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Omnivore      Muridae   Mus musculus          House mouse               2 Solid      Toxin            1080               
 3 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Omnivore      Muridae   Rattus rattus         Black rat                 2 Solid      Toxin            1080               
 4 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Omnivore      Muridae   Mus musculus          House mouse               3 Solid      Toxin            1080               
 5 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Omnivore      Muridae   Rattus rattus         Black rat                 3 Solid      Toxin            1080               
 6 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Carnivore     Canidae   Canis familiaris      Dog                       3 Solid      Toxin            1080               
 7 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Omnivore      Muridae   Mus musculus          House mouse               3 Solid      Toxin            1080               
 8 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Omnivore      Muridae   Rattus rattus         Black rat                 3 Solid      Toxin            1080               
 9 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Carnivore     Canidae   Canis familiaris      Dog                       3 Solid      Toxin            1080               
10 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Herbivore     Leporidae Oryctolagus cuniculus European rabbit           4 Solid      Toxin            1080               
11 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Omnivore      Muridae   Rattus rattus         Black rat                 4 Solid      Toxin            1080               
12 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Carnivore     Canidae   Canis familiaris      Dog                       4 Solid      Toxin            1080               
13 Asia      India          20.6         79.0 Punjab Ludhiana Mammal       Carnivore     Canidae   Canis familiaris      Dog                       4 Solid      Toxin            1080      
  •  Tags:  
  • r
  • Related