Home > Enterprise >  R format data frame duplicated ID and redondant information
R format data frame duplicated ID and redondant information

Time:05-23

From my dataframe, I would need to remove the non useful information labelled as "Not done" and keep the interesting one "Neg" available from one the duplicated ID. Sorry not easy to explain. So, my dataframe below :

df <- data.frame(ID = c("A1", "A1", "A1", "A2", "A2","A2", "A3","A3", "A3"),
                 Variable1 = c("Neg", "Not Done","Not Done", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done"),
                 Variable2 = c("Not Done",  "Neg",  "Not Done", "Neg",  "Not Done", "Not Done", "Not Done", "Not Done", "Not Done"),
                 Variable3 = c("Not Done","Not Done","Neg","Not Done","Not Done","Neg","Not Done","Not Done","Not Done"))

An example of the expected output :

df_A <- data.frame(ID = c("A1", "A2", "A3"),
                 Variable1 = c("Neg", "Neg", "Not Done"),
                 Variable2 = c("Neg", "Neg", "Not Done"),
                 Variable3 = c("Neg","Neg","Not Done"))

As you can see, A3, all the values are "Not Done" and so need to keep it once.

CodePudding user response:

A dplyr solution with which.max():

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(across(.fns = ~ .x[which.max(.x == "Neg")])) %>%
  ungroup()

# # A tibble: 3 × 4
#   ID    Variable1 Variable2 Variable3
#   <chr> <chr>     <chr>     <chr>
# 1 A1    Neg       Neg       Neg
# 2 A2    Neg       Neg       Neg      
# 3 A3    Not Done  Not Done  Not Done

CodePudding user response:

In case there is only Neg and Not Done I would convert them in TRUE and FALSE and use any an aggregate.

aggregate(df[-1]=="Neg", df[1], any)
#  ID Variable1 Variable2 Variable3
#1 A1      TRUE      TRUE      TRUE
#2 A2      TRUE      TRUE      TRUE
#3 A3     FALSE     FALSE     FALSE

CodePudding user response:

library(dplyr)
df$ID <- factor(df$ID)
ID <- factor(df$ID)
df <- distinct(df)

neg_find <- function(vector) {
  result <- "Neg" %in% vector
  return(result)
}


final_result_neg <- function(dataframe) {
  t <- tapply(dataframe, ID,neg_find)
  return(t)
}

df2 <- apply(df, 2, final_result_neg)%>           
  • Related