Home > OS >  Using data.table to create multiple columns with different condition
Using data.table to create multiple columns with different condition

Time:09-07

I would like to create the below data:

           TRT01P numSubjectsFrequentEvents  numSubjectsSeriousEvents
1:    Drug a 25mg                          6                        3
2:    Drug a 75mg                          3                       NA
3:        Placebo                          5                        1

where numSubjectsFrequentEvents are unique subject counts with at least one AEDECOD, and where numSubjectsSeriousEvents are unique subject counts with at least one AEDECOD with AESER = "Y"

the input data are as below:

              USUBJID         TRT01P                              AEDECOD  AESER
  1: xxxxxxxxxxx010001 Drug a 75mg  Diabetes mellitus inadequate control     N
  2: xxxxxxxxxxx010001 Drug a 75mg                 Hypercholesterolaemia     N
  3: xxxxxxxxxxx010001 Drug a 75mg               Injection site reaction     N
  4: xxxxxxxxxxx010001 Drug a 75mg                          Hypertension     N
  5: xxxxxxxxxxx010001 Drug a 75mg                 Rotator cuff syndrome     N
 ---                                                                            
125: xxxxxxxxxxx210003 Drug a 25mg                             Dysphonia     N
126: xxxxxxxxxxx210003 Drug a 25mg                             Skin mass     N
127: xxxxxxxxxxx210004 Drug a 75mg           Injection site inflammation     N
128: xxxxxxxxxxx210004 Drug a 75mg           Injection site inflammation     N
129: xxxxxxxxxxx210004 Drug a 75mg           Injection site inflammation     N

I know I could create two data.table and merge them, but I am looking for a way to make the data in one data.table step.

ae.freq <- adae[, .(numSubjectsFrequentEvents = uniqueN(USUBJID)), by = "TRT01P"]
ae.sae <- adae[AESER == "Y", .(numSubjectsSeriousEvents = uniqueN(USUBJID)), by = "TRT01P"]
freq <- merge(ae.freq, ae.sae, all.x = TRUE)

In real life, we might have a lot of columns with different conditions, so it would be great if we could avoid creating a lot of data.table and merging them.

Thank you for the help

CodePudding user response:

A possible solution

adae[, .(numSubjectsFrequentEvents = uniqueN(USUBJID),
         numSubjectsSeriousEvents = .SD[AESER == "Y",uniqueN(USUBJID)]), by = "TRT01P"]   

CodePudding user response:

library(dplyr)

df <- left_join(

df1 <-raw %>% 
  group_by(TRT01P,AEDECOD) %>% 
  summarize(count=length(AEDECOD)) %>% 
  mutate(numSubjectsFrequentEvents=ifelse(count>0,1,0)) %>% 
  group_by(TRT01P) %>% 
  summarize(numSubjectsFrequentEvents=sum(numSubjectsFrequentEvents))
,
df2 <-raw %>% 
  group_by(TRT01P,AEDECOD,AESER) %>% 
  summarize(count=length(AEDECOD)) %>%
  filter(AESER=="Y") %>% 
  mutate(numSubjectsSeriousEvents=ifelse(count>0,1,0)) %>% 
  group_by(TRT01P) %>% 
  summarize(numSubjectsSeriousEvents=sum(numSubjectsSeriousEvents)) 
,
by=c("TRT01P"))
  • Related