If this is my dataset
Id Weight Category
1 10.2 Pre
1 12.1 Post
2 11.3 Post
3 12.9 Pre
4 10.3 Post
4 12.3 Pre
5 11.8 Pre
How Do I get rid of duplicate IDs that are also Category=Pre. My final expected dataset would be
Id Weight Category
1 12.1 Post
2 11.3 Post
3 12.9 Pre
4 10.3 Post
5 11.8 Pre
CodePudding user response:
You may arrange the data and then use distinct
.
library(dplyr)
df %>% arrange(Id, Category) %>% distinct(Id, .keep_all = TRUE)
# Id Weight Category
#1 1 12.1 Post
#2 2 11.3 Post
#3 3 12.9 Pre
#4 4 10.3 Post
#5 5 11.8 Pre
This works because 'Pre' > 'Post'
.
CodePudding user response:
Using by
, split dat
by Id
and select Post
, then rbind
result.
do.call(rbind, by(dat, dat$Id, function(x)
if (nrow(x) == 2) x[x$Category == 'Post', ] else x))
# Id Weight Category
# 1 1 12.1 Post
# 2 2 11.3 Post
# 3 3 12.9 Pre
# 4 4 10.3 Post
# 5 5 11.8 Pre
Data:
dat <- read.table(header=T, text='
Id Weight Category
1 10.2 Pre
1 12.1 Post
2 11.3 Post
3 12.9 Pre
4 10.3 Post
4 12.3 Pre
5 11.8 Pre
')
CodePudding user response:
We could use filter
after grouping and arranging using first()
as Post
comes before Pre
:
df %>%
group_by(Id) %>%
arrange(Id, Category) %>%
filter(Category ==first(Category))
output:
Id Weight Category
<int> <dbl> <chr>
1 1 12.1 Post
2 2 11.3 Post
3 3 12.9 Pre
4 4 10.3 Post
5 5 11.8 Pre