I have a data frame that has a column with unique names and then another column that has factors/char (can do either). I want to be able to filter out duplicate names and IF one of the duplicates has a certain value for a factor I want to keep that one, if there are two duplicates with the same factor, then I don't care which one it keeps.
Name Status
1. John A
2. John B
3. Sally A
4. Alex A
5. Sarah B
6. Joe A
7. Joe A
8. Sue B
9. Sue B
I want to keep the duplicate if the factor/char is set to B. If there are two As or two Bs I don't care which one it keeps.
This is the result I want:
1. John B
2. Sally A
3. Alex A
4. Sarah B
5. Joe A
6. Sue B
I have tried the following but it's still keeping both A and B for John:
Name <- c("John","John","Sally","Alex", "Sarah", "Joe", "Joe", "Sue", "Sue")
Status <- c('A', 'B', 'A', 'A', 'B', 'A', 'A', 'B', 'B')
df_reddit <- data.frame(Name,Status)
df_reddit[, 'Status'] <- as.factor(df_reddit[, 'Status'])
df_reddit$Status <- factor(df_reddit$Status, levels = c("A", "B"))
df_reddit <- df_reddit[order(df_reddit$Status),]
df_reddit[!duplicated(df_reddit[,c('Name')]),]
Any help is appreciated! Would a loop or something be better for this?
CodePudding user response:
A pure dplyr
approach may look like so:
library(dplyr)
df_reddit |>
add_count(Name, wt = Status == "B") |>
filter(Status == "B" | n == 0) |>
distinct(Name, Status)
#> Name Status
#> 1 John B
#> 2 Sally A
#> 3 Alex A
#> 4 Sarah B
#> 5 Joe A
#> 6 Sue B
CodePudding user response:
Here's one option using dplyr
and data.table
:
library(dplyr)
library(data.table)
df_reddit %>%
group_by(Name) %>%
mutate(rle = data.table::rleid(Status)) %>%
filter(rle == max(rle)) %>%
select(-rle) %>%
unique() %>%
ungroup()
# A tibble: 6 × 2
Name Status
<chr> <chr>
1 John B
2 Sally A
3 Alex A
4 Sarah B
5 Joe A
6 Sue B