How can I identify and extract duplicates from data frame?-CodePudding

My objective is to check if a patient is using two drugs at the same date. In the example, patient 1 is using drug A and drug B at the same date, but I want to extract it with code.

df <- data.frame(id = c(1,1,1,2,2,2),    
                 date = c("2020-02-01","2020-02-01","2020-03-02","2019-10-02","2019-10-18","2019-10-26"),    
                 drug_type = c("A","B","A","A","A","B"))      
df$date <- as.factor(df$date)
df$drug_type <- as.factor(df$drug_type)

In order to do this, I firstly made date and drug type factor variables. Next I used following code:

df %>%  
  mutate(lev_actdate = as.factor(actdate))%>%        
  filter(nlevels(drug_type)>1 & nlevels(date) < nrow(date))

But I failed. I assumed that if a patient is using two drugs at the same date, the number of levels in the date column will be less than its row number. However, now I don't know how to make it with code.

Additionally, I feel weird about following:

if I use nlevels(df$date), right result will be returned, but when I use df %>% nlevels(date), the error will be return with showing

"Error in nlevels(., df$date) : unused argument (df$date)"

Could you please tell me why this occurred and how can I fix it? Thank you for your time.

CodePudding user response：

Do you need something like this?

library(dplyr)

df %>% 
  group_by(date) %>% 
  distinct() %>% 
  summarise(drug_type_sum = toString(drug_type))

  date       drug_type_sum
  <fct>      <chr>        
1 2019-10-02 A            
2 2019-10-18 A            
3 2019-10-26 B            
4 2020-02-01 A, B         
5 2020-03-02 A

CodePudding user response：

You could use something like

library(dplyr) 

df %>%
  group_by(id, date) %>%
  filter(n_distinct(drug_type) >= 2)

df %>% nlevels(date) is the same as nlevels(df, date) which is not the same as nlevels(df$date). Instead of the latter youcould try df %>% nlevels(.$date) or perhaps df %>% {nlevels(.$date)}.