I am hoping to get some help with this problem. I have a theoretical data frame including a list of drugs, administration dates, patients, and an outcome that is logical (headache_cured: true or false). I am hoping to modify in order to generate a new column that counts the number of unique patients who are taking a specific drug (which should remain unique rows, e.g., one drug per row) who are either cured or not of their headache.
For example,
patient_id <- c("a","a","b","c","d","d","d")
drug <- c("tylenol", "tylenol", "tylenol", "advil", "aspirin", "aspirin", "motrin")
date <- c("1/1/2000", "1/2/2000", "1/3/2000", "1/4/2000", "1/5/2000", "1/6/2000", "1/7/2000")
headache_cured <- c(T,T,T,T,F,F,F)
df <- data.frame(patient_id,date,drug,headache_cured)
Output:
patient_id | date | drug | headache_cured |
---|---|---|---|
a | 1/1/2000 | tylenol | TRUE |
a | 1/2/2000 | tylenol | TRUE |
b | 1/3/2000 | tylenol | TRUE |
c | 1/4/2000 | advil | TRUE |
d | 1/5/2000 | aspirin | FALSE |
d | 1/6/2000 | aspirin | FALSE |
d | 1/7/2000 | motrin | FALSE |
I would like to create a new data frame built like this:
drug | n_patients_cured | n_patients_not_cured |
---|---|---|
tylenol | 2 | 0 |
aspirin | 0 | 2 |
motrin | 0 | 1 |
advil | 1 | 0 |
I am sure this is trivial for most here, but I would really appreciate any help with this problem.
I have tried using code such as this below to solve the problem but cannot seem to figure this one out and would greatly appreciate a helping hand to a novice R user. Thank you!
df %>%
mutate(n_patients_cured = ifelse(headache_cured == T, n_distinct(patient_id),0) %>%
mutate(n_patients_not_cured = ifelse(headache_cured == F, n_distinct(patient_id),0)
CodePudding user response:
Here is a tidyverse
solution.
Count by the rows by drug and headache cured then reshape the data to wide format. The initial mutate
is meant to make the final column names agree with the expected output.
patient_id <- c("a","a","b","c","d","d","d")
drug <- c("tylenol", "tylenol", "tylenol", "advil", "aspirin", "aspirin", "motrin")
date <- c("1/1/2000", "1/2/2000", "1/3/2000", "1/4/2000", "1/5/2000", "1/6/2000", "1/7/2000")
headache_cured <- c(T,T,T,T,F,F,F)
df <- data.frame(patient_id,date,drug,headache_cured)
suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
})
df %>%
mutate(headache_cured = ifelse(headache_cured, "cured", "not_cured")) %>%
count(drug, headache_cured) %>%
pivot_wider(
id_cols = drug,
names_from = headache_cured,
names_glue = "n_patients_{headache_cured}",
values_from = n,
values_fill = 0L
)
#> # A tibble: 4 × 3
#> drug n_patients_cured n_patients_not_cured
#> <chr> <int> <int>
#> 1 advil 1 0
#> 2 aspirin 0 2
#> 3 motrin 0 1
#> 4 tylenol 3 0
Created on 2022-10-09 with reprex v2.0.2