I have a dataset similar to this:
> dput(df1)
structure(list(Person_Id = c(123L, 123L, 123L, 123L, 123L,
567L, 567L, 567L, 888L, 888L),
Result = c("Yes", "Yes", "Yes", "Yes", "Yes",
"No", "No", "No", "No", "Yes")),
class = "data.frame", row.names = c(NA, -10L))
As you can see, the Result
values for Person_id
123 and 567 are consistent throughout the entire dataframe. However, the Result
values for person 888 differs.
I want to filter this dataset:
- If the result differs for a given
Person_id
, I would like them to be removed. - Keep only one row for a given
Person_id
IF theResult
is consistent throughout the dataframe.
The output dataframe should look like:
> dput(df2)
structure(list(Person_Id = c(123L, 567L),
Result = c("Yes", "No")),
class = "data.frame", row.names = c(NA, -2L))
CodePudding user response:
A base R approach using aggregate
na.omit(aggregate(Result ~ Person_Id, df1, function(x)
ifelse(length(unique(x)) == 1, unique(x), NA)))
Person_Id Result
1 123 Yes
2 567 No
CodePudding user response:
Here's a dplyr
solution:
library(dplyr)
df1 %>%
group_by(Person_Id) %>%
filter(n_distinct(Result) == 1) %>%
unique %>% ungroup
#> # A tibble: 2 x 2
#> Person_Id Result
#> <int> <chr>
#> 1 123 Yes
#> 2 567 No