Home > front end >  Remove duplicates in a dataframe except list R
Remove duplicates in a dataframe except list R

Time:05-11

I would like to remove the duplicates of a dataframe column but keep some elements which were previously stored in a list.

my_df <- data.frame(Municipality=c('a', 'b', 'c', 'd', 'a', 'e', 'd','f','g','b','a'),
                    state=c('ac', 'pe', 'pi', 'pi', 'ac', 'am', 'pi','sp','sp','pi','ac'),
                    date=c('2006', '2007', '2007', '2008', '2009', '2010', '2010','2011','2012','2013','2013'))
desired_df <- data.frame(Municipality=c('a', 'b', 'c', 'd','e','f','g','b','a'),
                    state=c('ac', 'pe', 'pi', 'pi', 'am','sp','sp','pi','ac'),
                    date=c('2006', '2007', '2007', '2008', '2010','2011','2012','2013','2013'))  

I tried to create a list of the Municipality to be kept (b and a in 2013) and then remove duplicates except those from the previous list. Something like that:

municipality_twice_keep<-as.data.frame(c("a", "b"))
desired_df2=my_df[!(my_df$Municipality %in% municipality_twice_keep & my_df$date=='2013'),]                   

However, nothing is changed. Could you please advise me?

CodePudding user response:

You can do remove the duplicated rows among rows whose date is not 2013:

my_df[!duplicated(my_df$Municipality[my_df$date != 2013]), ]

   Municipality state date
1             a    ac 2006
2             b    pe 2007
3             c    pi 2007
4             d    pi 2008
6             e    am 2010
8             f    sp 2011
9             g    sp 2012
10            b    pi 2013
11            a    ac 2013
  • Related