I have some patients who receive different treatments at different times. I want to change the treatment they have received into a binary variable that takes the value of 1 if the patient has received the drug at least once and the value of 0 if they have never received it.
I managed to do this but in a tedious way, which could be difficult with dozens of different types of drugs.
I would like to optimize my code which mainly avoids creating all the binary variables related to the drug one by one.
id<-c(rep("A",7),rep("B",4))
medoc<-c("par","mor","mor","par","sed","sed",
"sed","cur","sed","cur","sed")
mydata<-data.frame(id,medoc)
mydata2<-mydata%>%group_by(id)%>%
mutate(medoc_str=paste(unique(medoc),collapse = " "))%>%
distinct(id,.keep_all = TRUE)
mydata2$par<-NA
mydata2$mor<-NA
mydata2$sed<-NA
mydata2$cur<-NA
mydata2$par<-ifelse(
grepl("par",mydata2$medoc_str)==TRUE,1,0
)
mydata2$mor<-ifelse(
grepl("mor",mydata2$medoc_str)==TRUE,1,0
)
mydata2$sed<-ifelse(
grepl("sed",mydata2$medoc_str)==TRUE,1,0
)
mydata2$cur<-ifelse(
grepl("cur",mydata2$medoc_str)==TRUE,1,0
)
CodePudding user response:
If I understand it, you want to dummify your variables.
We can do it with tidyr::pivot_wider
too, but I really like to use specific libraries to do it very easily. I like the fastDummies
package:
library(fastDummies)
dummy_cols(mydata, select_columns = 'medoc')
id medoc medoc_cur medoc_mor medoc_par medoc_sed
1 1 par 0 0 1 0
2 1 mor 0 1 0 0
3 1 mor 0 1 0 0
4 1 par 0 0 1 0
5 1 sed 0 0 0 1
6 1 sed 0 0 0 1
7 1 sed 0 0 0 1
8 2 cur 1 0 0 0
9 2 sed 0 0 0 1
10 2 cur 1 0 0 0
11 2 sed 0 0 0 1
And here is an answer with pivot_wider
:
library(tidyr)
library(dplyr)
mydata %>% mutate(index = row_number()) %>%
pivot_wider(names_from = medoc,
values_from = medoc,
values_fn = \(x) !is.na(x),
values_fill = 0)
CodePudding user response:
A solution similar to @Guedes's but with different values_fn
:
library(dplyr)
library(tidyr)
mydata %>%
mutate(row = row_number()) %>%
pivot_wider(names_from = medoc, values_from = medoc,
values_fn = function(x) 1, values_fill = 0) %>%
select(-row)
# A tibble: 11 x 5
id par mor sed cur
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 0
2 1 0 1 0 0
3 1 0 1 0 0
4 1 1 0 0 0
5 1 0 0 1 0
6 1 0 0 1 0
7 1 0 0 1 0
8 2 0 0 0 1
9 2 0 0 1 0
10 2 0 0 0 1
11 2 0 0 1 0
CodePudding user response:
Using mydata in the Note at the end and assuming that you want one row for each id with binary columns indicating which values of medoc are present (1) or absent (0) we can use table like this. (If you would like counts instead of presence/absence then omit the pmin.)
pmin(table(mydata), 1)
## medoc
## id cur mor par sed
## A 0 1 1 1
## B 1 0 0 1
or as a data frame and adding medoc_str
library(dplyr)
library(tibble)
mydata %>%
table %>%
pmin(1) %>%
as.data.frame.matrix %>%
rownames_to_column(var = "id") %>%
group_by(id) %>%
mutate(medoc_str = paste(names(cur_data())[c_across() == 1], collapse = " ")) %>%
ungroup
## # A tibble: 2 x 6
## id cur mor par sed medoc_str
## <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 A 0 1 1 1 mor par sed
## 2 B 1 0 0 1 cur sed
Note
Modified mydata to use better id's.
id <- c(rep("A", 7), rep("B", 4))
medoc <- c("par", "mor", "mor", "par", "sed", "sed",
"sed", "cur", "sed", "cur", "sed")
mydata <- data.frame(id, medoc)
mydata looks like this.
> mydata
id medoc
1 A par
2 A mor
3 A mor
4 A par
5 A sed
6 A sed
7 A sed
8 B cur
9 B sed
10 B cur
11 B sed