I have a dataframe of patient IDs and their various medications. Their medications are each listed in the format "name,code,class,dose" in one string separated by commas. Code for a sample dataframe below:
id <- c(1,2,3)
med1 <- c("Penicillin,1,Antibiotic,5mg","Fluoxetine,5,Antidepressant,20mg","Olanzapine,9,Antipsychotic,2mg")
med2 <- c("Aspirin,4,Blood thinner,81mg",NA,"Lisinopril,3,Antihypertensive,10mg")
med3 <- c(NA,NA,"Amlodipine,2,Antihypertensive,5mg")
meds <- data.frame(id, med1, med2, med3)
meds
However, I want to separate these out by each category, so that it would look like:
meds2 <- meds %>% separate(med1, c('med_name1', 'med_code1', "med_class1", "med_dose1"), sep=",") %>% separate(med2, c('med_name2', 'med_code2', "med_class2", "med_dose2"), sep=",") %>% separate(med3, c('med_name3', 'med_code3', "med_class3", "med_dose3"), sep=",")
meds2
I am able to do this with the code I included, but I have to manually write out the separate function for every single medication. In my actual dataset, some patients have over 100 medications, so I would not be able to do this. I tried to write a for loop using the separate function but cannot get it to work:
meds <- for(i in 1:3){
separate(meds2[i 1], c(paste0("med_name", i), paste0("med_code", i), paste0("med_class", i), paste0("med_dose", i), sep=","))
}
Does anyone know where I might be going wrong or know of another approach to achieve this? Thanks so much!
CodePudding user response:
An easier option would be to split the columns by ,
and then use unnest_wider
library(dplyr)
library(tidyr)
nc <- ncol(meds)-1
nm1 <- paste0("med_", rep(c("name", "code", "class", "dose"),
nc), rep(seq_len(nc), each = 4))
meds %>%
mutate(across(starts_with('med'),
~strsplit(as.character(.x), ",\\s*"))) %>%
unnest_wider(where(is.list), names_sep = "") %>%
setNames(c("id", nm1)) %>%
type.convert(as.is = TRUE)
-output
# A tibble: 3 × 13
id med_name1 med_code1 med_class1 med_dose1 med_name2 med_code2 med_class2 med_dose2 med_name3 med_code3 med_class3 med_dose3
<int> <chr> <int> <chr> <chr> <chr> <int> <chr> <chr> <chr> <int> <chr> <chr>
1 1 Penicillin 1 Antibiotic 5mg Aspirin 4 Blood thinner 81mg <NA> NA <NA> <NA>
2 2 Fluoxetine 5 Antidepressant 20mg <NA> NA <NA> <NA> <NA> NA <NA> <NA>
3 3 Olanzapine 9 Antipsychotic 2mg Lisinopril 3 Antihypertensive 10mg Amlodipine 2 Antihypertensive 5mg
Or using read.csv
within mutate
meds %>%
mutate(across(starts_with('med'),
~ read.csv(text = .x, header = FALSE,
col.names = c("name", "code", "class", "dose")))) %>%
unnest_wider(where(is.data.frame), names_sep = "_")
-output
# A tibble: 3 × 13
id med1_name med1_code med1_class med1_dose med2_name med2_code med2_class med2_dose med3_name med3_code med3_class med3_dose
<dbl> <chr> <int> <chr> <chr> <chr> <int> <chr> <chr> <chr> <int> <chr> <chr>
1 1 Penicillin 1 Antibiotic 5mg Aspirin 4 "Blood thinner" "81mg" <NA> NA "" ""
2 2 Fluoxetine 5 Antidepressant 20mg <NA> NA "" "" <NA> NA "" ""
3 3 Olanzapine 9 Antipsychotic 2mg Lisinopril 3 "Antihypertensive" "10mg" Amlodipine 2 "Antihypertensive" "5mg"
Or we could do this easily in base R
with read.csv
lst1 <- lapply(meds[-1], function(x)
read.csv(text = x, header = FALSE, col.names =
c("name", "code", "class", "dose"), fill = TRUE))
cbind(meds[1], do.call(cbind, unname(Map(function(x, y)
setNames(x, paste0(y, "_", names(x))), lst1, names(lst1)))))
-output
id med1_name med1_code med1_class med1_dose med2_name med2_code med2_class med2_dose med3_name med3_code med3_class med3_dose
1 1 Penicillin 1 Antibiotic 5mg Aspirin 4 Blood thinner 81mg <NA> NA
2 2 Fluoxetine 5 Antidepressant 20mg <NA> NA <NA> NA
3 3 Olanzapine 9 Antipsychotic 2mg Lisinopril 3 Antihypertensive 10mg Amlodipine 2 Antihypertensive 5mg
CodePudding user response:
We could pivot long, separate into columns, and pivot wide again incorporating those:
meds %>%
pivot_longer(-id) %>%
separate(value, c("med_name", "med_code", "med_class", "med_dose"), sep = ",") %>%
mutate(name = readr::parse_number(name)) %>% # "med1" -> "1"
pivot_wider(names_from = name, names_sep = "", names_vary = "slowest",
values_from = starts_with("med"))
Result
# A tibble: 3 × 13
id med_name1 med_code1 med_class1 med_dose1 med_name2 med_code2 med_class2 med_dose2 med_name3 med_code3 med_class3 med_dose3
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Penicillin 1 Antibiotic 5mg Aspirin 4 Blood thinner 81mg NA NA NA NA
2 2 Fluoxetine 5 Antidepressant 20mg NA NA NA NA NA NA NA NA
3 3 Olanzapine 9 Antipsychotic 2mg Lisinopril 3 Antihypertensive 10mg Amlodipine 2 Antihypertensive 5mg