Home > Enterprise >  Splitting string in cell and duplicate Row
Splitting string in cell and duplicate Row

Time:06-13

I'm trying to split out the contents of a cell string, and then duplicate all of the corresponding cells in the row into a duplicate row. In looking at other similar questions I'm finding a partial answer, but my results are dropping some of the data, that I still want to keep. Sample table (although the real one has many more columns and rows)

Client <- LETTERS[seq(1:6)]
Package <- c("1", "1 & 2", "2", "1 & 2","1 & 2", "")
Scope <- c("full", "rapid", "partial", "full", "full", "")
Assignee <- randomNames(6)
df <- data.frame(Client, Package, Scope, Assignee)


 Client Package   Scope           Assignee
1      A       1    full     Young, Devante
2      B   1 & 2  rapid     Catholic, Kori
3      C       2 partial      Maxwell, Luke
4      D   1 & 2    full  Vanzant, Giovanni
5      E   1 & 2    full    el-Sadiq, Aaida
6      F                 el-Yousef, Muneera

In this table I need to split out the data under "Package" and duplicate the info into a row. If I do the following:

df %>%
  mutate(Package = strsplit(as.character(Package), " & ")) %>%
  unnest(Package) 

I get the Package content split out and the rows duplicated. As shown below in my results, the problem is that the data from "Client F" is omitted from the results. Like the example table, there will be rows in my real data that has not "package data" and I don't want to lose that information.

Client Package Scope   Assignee         
  <chr>  <chr>   <chr>   <chr>            
1 A      1       full    Young, Devante   
2 B      1       rapid  Catholic, Kori   
3 B      2       rapid  Catholic, Kori   
4 C      2       partial Maxwell, Luke    
5 D      1       full    Vanzant, Giovanni
6 D      2       full    Vanzant, Giovanni
7 E      1       full    el-Sadiq, Aaida  
8 E      2       full    el-Sadiq, Aaida  

Any suggestions would be much appreciated.

CodePudding user response:

Does this approach work for you using tidyr's separate_rows:

library(tidyverse)

df <- tribble(
  ~Client, ~Package, ~Scope, ~Assignee,
  "A", "1", "full", "Young, Devante",
  "B", "1 & 2", "rapid", "Catholic, Kori",
  "C", "2", "partial", "Maxwell, Luke",
  "D", "1 & 2", "full", "Vanzant, Giovanni",
  "E", "1 & 2", "full", "el-Sadiq, Aaida",
  "F", NA, NA, "el-Yousef, Muneera"
)

df |> separate_rows(Package)

#> # A tibble: 9 × 4
#>   Client Package Scope   Assignee          
#>   <chr>  <chr>   <chr>   <chr>             
#> 1 A      1       full    Young, Devante    
#> 2 B      1       rapid   Catholic, Kori    
#> 3 B      2       rapid   Catholic, Kori    
#> 4 C      2       partial Maxwell, Luke     
#> 5 D      1       full    Vanzant, Giovanni 
#> 6 D      2       full    Vanzant, Giovanni 
#> 7 E      1       full    el-Sadiq, Aaida   
#> 8 E      2       full    el-Sadiq, Aaida   
#> 9 F      <NA>    <NA>    el-Yousef, Muneera

Created on 2022-06-13 by the reprex package (v2.0.1)

  • Related