Home > Enterprise >  R dataframe Removing duplicates / choosing which duplicate to remove
R dataframe Removing duplicates / choosing which duplicate to remove

Time:12-03

I have a dataframe that has duplicates based on their identifying ID, but some of the columns are different. I'd like to keep the rows (or the duplicates) that have the extra bit of info. The structure of the df is as such.

id <- c("3235453", "3235453", "21354315", "21354315", "2121421")
Plan_name<- c("angers", "strasbourg",  "Benzema", "angers", "montpellier")
service_line<- c("", "AMRS", "", "Therapy", "")
treatment<-c("", "MH", "", "MH", "")
df <- data.frame (id, Plan_name, treatment, service_line)

As you can see, the ID row has duplicates, but I'd like to keep the second duplicate where there is more info in treatment and service_line.

I have tried using

df[duplicated(df[,c(1,3)]),]

but it doesn't work as an empty df is returned. Any suggestions?

CodePudding user response:

Maybe you want something like this: First we replace all blank with NA, then we arrange be Section.B and finally slice() first row from group:

library(dplyr)
df %>%
  mutate(across(-c(id, Plan_name),~ifelse(.=="", NA, .))) %>% 
  group_by(id) %>% 
  arrange(Section.B, .by_group = TRUE) %>% 
  slice(1)
 id       Plan_name   Section.B Section.C
  <chr>    <chr>       <chr>     <chr>    
1 2121421  montpellier NA        NA       
2 21354315 angers      MH        Therapy  
3 3235453  strasbourg  MH        AMRS  

CodePudding user response:

Try with

library(dplyr)
df %>%
   filter(if_all(treatment:service_line, ~ .x != ""))

-output

        id  Plan_name Section.B Section.C
1  3235453 strasbourg        MH      AMRS
2 21354315     angers        MH   Therapy

If we need ids with blanks and not duplicated as well

df %>% 
   group_by(id) %>%
    filter(n() == 1|if_all(treatment:service_line, ~ .x != "")) %>%
   ungroup

-output

# A tibble: 3 × 4
  id       Plan_name   treatment service_line
  <chr>    <chr>       <chr>     <chr>       
1 3235453  strasbourg  "MH"      "AMRS"      
2 21354315 angers      "MH"      "Therapy"   
3 2121421  montpellier ""        ""          
  • Related