Home > Enterprise >  dplyr::mutate -- split up string into multiple rows
dplyr::mutate -- split up string into multiple rows

Time:07-01

I have a data frame like this:

df <- data.frame(quadrant = rep(2:3, each = 3, times = 1),
                 ICD = c("S06|D11|A41|O34|O48",
                         "R55|A08|K40",
                         "N23|F13|K80|F19",
                         "R13|C18",
                         "E13|F19",
                         "D11|A41|N23|K80"))

I want to split up the string variable, generate a new row for each ICD while repeating the quadrant identifier.

Does somebody know a way how to handle this via dplyr::mutate?

CodePudding user response:

You can use the following code:

df <- data.frame(quadrant = rep(2:3, each = 3, times = 1),
                 ICD = c("S06|D11|A41|O34|O48",
                         "R55|A08|K40",
                         "N23|F13|K80|F19",
                         "R13|C18",
                         "E13|F19",
                         "D11|A41|N23|K80"))

library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.1.2
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.1.2
df %>% 
  dplyr::mutate(ICD = strsplit(as.character(ICD), "|", fixed = TRUE)) %>%
  unnest(ICD)
#> # A tibble: 20 × 2
#>    quadrant ICD  
#>       <int> <chr>
#>  1        2 S06  
#>  2        2 D11  
#>  3        2 A41  
#>  4        2 O34  
#>  5        2 O48  
#>  6        2 R55  
#>  7        2 A08  
#>  8        2 K40  
#>  9        2 N23  
#> 10        2 F13  
#> 11        2 K80  
#> 12        2 F19  
#> 13        3 R13  
#> 14        3 C18  
#> 15        3 E13  
#> 16        3 F19  
#> 17        3 D11  
#> 18        3 A41  
#> 19        3 N23  
#> 20        3 K80

Created on 2022-06-30 by the reprex package (v2.0.1)

CodePudding user response:

I would use separate_rows from tidyr:

library(tidyr)

df |>
  separate_rows(ICD, sep = "\\|")
  • Related