Home > Blockchain >  How can I remove duplicate values if I have a dataframe
How can I remove duplicate values if I have a dataframe

Time:06-18

As an example:

I have a set of computers in which I run tests (sometimes multiple) to indicate whether they are working or not when I run the test. Say the table looks like this:

Computer Working?
A 0
A 1
B 1
B 1
B 1
C 0
C 0
D 0
D 0
D 0
D 1
E 0
E 1

I have this table as a dataframe named WorkingComputerDf. I would like to remove duplicates for any computer that is not working (with the value 0). If a computer test is done multiple times, I would like to get rid of the instances where the computer is not working. So if we take the table above, I'd like the end result to be:

Computer Working?
A 1
B 1
B 1
B 1
C 0
D 1
E 1

Basically, i want to keep duplicates where the computer is working. But I'd like to get rid of any case where the same computer works and doesn't work (and get rid of all the cases where the computer doesn't work), as well as any cases where the same computer repeatedly does not work.

I'm not sure if there's an easy way to use unique() or some other method? Could I use tidyverse? Or would it be simplest to write an if/else statement?

CodePudding user response:

@akrun's answer works like a charm. However (partly based on your follow up question), a more readable -- but also more verbose! -- solution would be:

library(tidyverse)

df %>% 
  group_by(computer) %>% 
  arrange(desc(working)) %>% 
  mutate(nrow = row_number(),
         sum_working = sum(working, na.rm = TRUE)) %>% 
  ungroup() %>% 
  filter(nrow == 1 | sum_working > 1) %>% 
  select(computer,
         working) %>% 
  arrange(computer)
  • Related