Sample dataset
df <- data.frame (co = c(11.5,1.3,7.8,2.3,2.3,3.1,5.7,5.7,9.3),
factor = c(NA,NA,NA,3,NA,5,NA,6,0.3),
condition = c (NA,NA,NA,12.3,NA,13.5,NA,18.7,NA)))
I want to remove duplicate lines against the variable co.
df.2 <- distinct(df, co, .keep_all = TRUE)
I get the following result:
co factor condition
1 11.5 NA NA
2 1.3 NA NA
3 7.8 NA NA
4 2.3 3.0 12.3
5 3.1 5.0 13.5
6 5.7 NA NA
7 9.3 0.3 NA
I would like the end result to be as follows
co factor condition
1 11.5 NA NA
2 1.3 NA NA
3 7.8 NA NA
4 2.3 3.0 12.3
5 3.1 5.0 13.5
6 5.7 6.0 18.7
7 9.3 0.3 NA
Row where the value of factor is greater than the duplicate row with respect to the variable which is better (in this case for co = 5.7 factor is NA, but it may be a coincidence that co = 5.7; factor = 5.5, condition = 11.2, then I want to get 5.7; 6; 18.7 anyway)
CodePudding user response:
You can first arrange
your data, so that the records with NA
will be at the end of the dataframe, then do your distinct
.
Edit: Since you've updated your question, I also updated my answer. You can use arrange(desc(factor))
to select rows with the highest value.
library(dplyr)
df %>%
arrange(co, desc(factor), desc(condition)) %>%
distinct(co, .keep_all = T)
co factor condition
1 1.3 NA NA
2 2.3 3.0 12.3
3 3.1 5.0 13.5
4 5.7 6.0 18.7
5 7.8 NA NA
6 9.3 0.3 NA
7 11.5 NA NA