Using the separate function within a for loop in R-CodePudding

I have a dataframe of patient IDs and their various medications. Their medications are each listed in the format "name,code,class,dose" in one string separated by commas. Code for a sample dataframe below:

id <- c(1,2,3)
med1 <- c("Penicillin,1,Antibiotic,5mg","Fluoxetine,5,Antidepressant,20mg","Olanzapine,9,Antipsychotic,2mg")
med2 <- c("Aspirin,4,Blood thinner,81mg",NA,"Lisinopril,3,Antihypertensive,10mg")
med3 <- c(NA,NA,"Amlodipine,2,Antihypertensive,5mg")
meds <- data.frame(id, med1, med2, med3)
meds

However, I want to separate these out by each category, so that it would look like:

meds2 <- meds %>% separate(med1, c('med_name1', 'med_code1', "med_class1", "med_dose1"), sep=",") %>% separate(med2, c('med_name2', 'med_code2', "med_class2", "med_dose2"), sep=",") %>% separate(med3, c('med_name3', 'med_code3', "med_class3", "med_dose3"), sep=",")
meds2

I am able to do this with the code I included, but I have to manually write out the separate function for every single medication. In my actual dataset, some patients have over 100 medications, so I would not be able to do this. I tried to write a for loop using the separate function but cannot get it to work:

meds <- for(i in 1:3){
  separate(meds2[i 1], c(paste0("med_name", i), paste0("med_code", i), paste0("med_class", i), paste0("med_dose", i), sep=","))
}

Does anyone know where I might be going wrong or know of another approach to achieve this? Thanks so much!

CodePudding user response：

An easier option would be to split the columns by , and then use unnest_wider

library(dplyr)
library(tidyr)
nc <- ncol(meds)-1
nm1 <- paste0("med_", rep(c("name", "code", "class", "dose"),
      nc), rep(seq_len(nc), each = 4))

meds %>% 
  mutate(across(starts_with('med'),
       ~strsplit(as.character(.x), ",\\s*"))) %>%
  unnest_wider(where(is.list), names_sep = "") %>%
  setNames(c("id", nm1)) %>%
  type.convert(as.is = TRUE)

-output

# A tibble: 3 × 13
     id med_name1  med_code1 med_class1     med_dose1 med_name2  med_code2 med_class2       med_dose2 med_name3  med_code3 med_class3       med_dose3
  <int> <chr>          <int> <chr>          <chr>     <chr>          <int> <chr>            <chr>     <chr>          <int> <chr>            <chr>    
1     1 Penicillin         1 Antibiotic     5mg       Aspirin            4 Blood thinner    81mg      <NA>              NA <NA>             <NA>     
2     2 Fluoxetine         5 Antidepressant 20mg      <NA>              NA <NA>             <NA>      <NA>              NA <NA>             <NA>     
3     3 Olanzapine         9 Antipsychotic  2mg       Lisinopril         3 Antihypertensive 10mg      Amlodipine         2 Antihypertensive 5mg

Or using read.csv within mutate

meds %>%
  mutate(across(starts_with('med'),
  ~ read.csv(text = .x, header = FALSE, 
   col.names = c("name", "code", "class", "dose")))) %>% 
  unnest_wider(where(is.data.frame), names_sep = "_")

-output

# A tibble: 3 × 13
     id med1_name  med1_code med1_class     med1_dose med2_name  med2_code med2_class         med2_dose med3_name  med3_code med3_class         med3_dose
  <dbl> <chr>          <int> <chr>          <chr>     <chr>          <int> <chr>              <chr>     <chr>          <int> <chr>              <chr>    
1     1 Penicillin         1 Antibiotic     5mg       Aspirin            4 "Blood thinner"    "81mg"    <NA>              NA ""                 ""       
2     2 Fluoxetine         5 Antidepressant 20mg      <NA>              NA ""                 ""        <NA>              NA ""                 ""       
3     3 Olanzapine         9 Antipsychotic  2mg       Lisinopril         3 "Antihypertensive" "10mg"    Amlodipine         2 "Antihypertensive" "5mg"

Or we could do this easily in base R with read.csv

lst1 <- lapply(meds[-1], function(x) 
   read.csv(text = x, header = FALSE, col.names =
    c("name", "code", "class", "dose"), fill = TRUE))
 cbind(meds[1], do.call(cbind, unname(Map(function(x, y)
   setNames(x, paste0(y, "_", names(x))), lst1, names(lst1)))))

-output

 id  med1_name med1_code     med1_class med1_dose  med2_name med2_code       med2_class med2_dose  med3_name med3_code       med3_class med3_dose
1  1 Penicillin         1     Antibiotic       5mg    Aspirin         4    Blood thinner      81mg       <NA>        NA                           
2  2 Fluoxetine         5 Antidepressant      20mg       <NA>        NA                                  <NA>        NA                           
3  3 Olanzapine         9  Antipsychotic       2mg Lisinopril         3 Antihypertensive      10mg Amlodipine         2 Antihypertensive       5mg

CodePudding user response：

We could pivot long, separate into columns, and pivot wide again incorporating those:

meds %>%
  pivot_longer(-id) %>%
  separate(value, c("med_name", "med_code", "med_class", "med_dose"), sep = ",") %>%
  mutate(name = readr::parse_number(name)) %>% # "med1" -> "1"
  pivot_wider(names_from = name, names_sep = "", names_vary = "slowest",
              values_from = starts_with("med"))

Result

# A tibble: 3 × 13
     id med_name1  med_code1 med_class1     med_dose1 med_name2  med_code2 med_class2       med_dose2 med_name3  med_code3 med_class3       med_dose3
  <dbl> <chr>      <chr>     <chr>          <chr>     <chr>      <chr>     <chr>            <chr>     <chr>      <chr>     <chr>            <chr>    
1     1 Penicillin 1         Antibiotic     5mg       Aspirin    4         Blood thinner    81mg      NA         NA        NA               NA       
2     2 Fluoxetine 5         Antidepressant 20mg      NA         NA        NA               NA        NA         NA        NA               NA       
3     3 Olanzapine 9         Antipsychotic  2mg       Lisinopril 3         Antihypertensive 10mg      Amlodipine 2         Antihypertensive 5mg