From my dataframe, I would need to remove the non useful information labelled as "Not done" and keep the interesting one "Neg" available from one the duplicated ID. Sorry not easy to explain. So, my dataframe below :
df <- data.frame(ID = c("A1", "A1", "A1", "A2", "A2","A2", "A3","A3", "A3"),
Variable1 = c("Neg", "Not Done","Not Done", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done"),
Variable2 = c("Not Done", "Neg", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done", "Not Done"),
Variable3 = c("Not Done","Not Done","Neg","Not Done","Not Done","Neg","Not Done","Not Done","Not Done"))
An example of the expected output :
df_A <- data.frame(ID = c("A1", "A2", "A3"),
Variable1 = c("Neg", "Neg", "Not Done"),
Variable2 = c("Neg", "Neg", "Not Done"),
Variable3 = c("Neg","Neg","Not Done"))
As you can see, A3, all the values are "Not Done" and so need to keep it once.
CodePudding user response:
A dplyr
solution with which.max()
:
library(dplyr)
df %>%
group_by(ID) %>%
summarise(across(.fns = ~ .x[which.max(.x == "Neg")])) %>%
ungroup()
# # A tibble: 3 × 4
# ID Variable1 Variable2 Variable3
# <chr> <chr> <chr> <chr>
# 1 A1 Neg Neg Neg
# 2 A2 Neg Neg Neg
# 3 A3 Not Done Not Done Not Done
CodePudding user response:
In case there is only Neg
and Not Done
I would convert them in TRUE
and FALSE
and use any
an aggregate
.
aggregate(df[-1]=="Neg", df[1], any)
# ID Variable1 Variable2 Variable3
#1 A1 TRUE TRUE TRUE
#2 A2 TRUE TRUE TRUE
#3 A3 FALSE FALSE FALSE
CodePudding user response:
library(dplyr)
df$ID <- factor(df$ID)
ID <- factor(df$ID)
df <- distinct(df)
neg_find <- function(vector) {
result <- "Neg" %in% vector
return(result)
}
final_result_neg <- function(dataframe) {
t <- tapply(dataframe, ID,neg_find)
return(t)
}
df2 <- apply(df, 2, final_result_neg)%>