I would like to remove the duplicates of a dataframe column but keep some elements which were previously stored in a list.
my_df <- data.frame(Municipality=c('a', 'b', 'c', 'd', 'a', 'e', 'd','f','g','b','a'),
state=c('ac', 'pe', 'pi', 'pi', 'ac', 'am', 'pi','sp','sp','pi','ac'),
date=c('2006', '2007', '2007', '2008', '2009', '2010', '2010','2011','2012','2013','2013'))
desired_df <- data.frame(Municipality=c('a', 'b', 'c', 'd','e','f','g','b','a'),
state=c('ac', 'pe', 'pi', 'pi', 'am','sp','sp','pi','ac'),
date=c('2006', '2007', '2007', '2008', '2010','2011','2012','2013','2013'))
I tried to create a list of the Municipality to be kept (b and a in 2013) and then remove duplicates except those from the previous list. Something like that:
municipality_twice_keep<-as.data.frame(c("a", "b"))
desired_df2=my_df[!(my_df$Municipality %in% municipality_twice_keep & my_df$date=='2013'),]
However, nothing is changed. Could you please advise me?
CodePudding user response:
You can do remove the duplicated rows among rows whose date is not 2013:
my_df[!duplicated(my_df$Municipality[my_df$date != 2013]), ]
Municipality state date
1 a ac 2006
2 b pe 2007
3 c pi 2007
4 d pi 2008
6 e am 2010
8 f sp 2011
9 g sp 2012
10 b pi 2013
11 a ac 2013