Home > front end >  Wrangling Patient-Drug Administration Data By Outcome
Wrangling Patient-Drug Administration Data By Outcome

Time:10-10

I am hoping to get some help with this problem. I have a theoretical data frame including a list of drugs, administration dates, patients, and an outcome that is logical (headache_cured: true or false). I am hoping to modify in order to generate a new column that counts the number of unique patients who are taking a specific drug (which should remain unique rows, e.g., one drug per row) who are either cured or not of their headache.

For example,

patient_id <- c("a","a","b","c","d","d","d")
drug <- c("tylenol", "tylenol", "tylenol", "advil", "aspirin", "aspirin", "motrin")
date <- c("1/1/2000", "1/2/2000", "1/3/2000", "1/4/2000", "1/5/2000", "1/6/2000", "1/7/2000")
headache_cured <- c(T,T,T,T,F,F,F)

df <- data.frame(patient_id,date,drug,headache_cured)

Output:

patient_id date drug headache_cured
a 1/1/2000 tylenol TRUE
a 1/2/2000 tylenol TRUE
b 1/3/2000 tylenol TRUE
c 1/4/2000 advil TRUE
d 1/5/2000 aspirin FALSE
d 1/6/2000 aspirin FALSE
d 1/7/2000 motrin FALSE

I would like to create a new data frame built like this:

drug n_patients_cured n_patients_not_cured
tylenol 2 0
aspirin 0 2
motrin 0 1
advil 1 0

I am sure this is trivial for most here, but I would really appreciate any help with this problem.

I have tried using code such as this below to solve the problem but cannot seem to figure this one out and would greatly appreciate a helping hand to a novice R user. Thank you!

df %>%
mutate(n_patients_cured = ifelse(headache_cured == T, n_distinct(patient_id),0) %>%
mutate(n_patients_not_cured = ifelse(headache_cured == F, n_distinct(patient_id),0)

CodePudding user response:

Here is a tidyverse solution.
Count by the rows by drug and headache cured then reshape the data to wide format. The initial mutate is meant to make the final column names agree with the expected output.

patient_id <- c("a","a","b","c","d","d","d")
drug <- c("tylenol", "tylenol", "tylenol", "advil", "aspirin", "aspirin", "motrin")
date <- c("1/1/2000", "1/2/2000", "1/3/2000", "1/4/2000", "1/5/2000", "1/6/2000", "1/7/2000")
headache_cured <- c(T,T,T,T,F,F,F)

df <- data.frame(patient_id,date,drug,headache_cured)

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

df %>%
  mutate(headache_cured = ifelse(headache_cured, "cured", "not_cured")) %>%
  count(drug, headache_cured) %>%
  pivot_wider(
    id_cols = drug,
    names_from = headache_cured,
    names_glue = "n_patients_{headache_cured}",
    values_from = n,
    values_fill = 0L
  )
#> # A tibble: 4 × 3
#>   drug    n_patients_cured n_patients_not_cured
#>   <chr>              <int>                <int>
#> 1 advil                  1                    0
#> 2 aspirin                0                    2
#> 3 motrin                 0                    1
#> 4 tylenol                3                    0

Created on 2022-10-09 with reprex v2.0.2

  •  Tags:  
  • r
  • Related