Deleting set of rows using group

In the 10 rows of data frame below, I have sightings of either whale or vessel and these sightings are grouped by ScanID.

By using the dyplr library, I am trying to figure out a way to remove the scan without any whales, in this case, it would scan 2 and 5.

I think group_by would be useful but I am not sure how to proceed from there.

whales <- data.frame(rubbing.beach = c('whale', 'vessel', 'vessel', 'vessel', 'whale', 'whale', 'whale', 'vessel', 'vessel', 'whale'), 
ScanID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6))

X	Target	ScanID
1	whale	1
2	vessel	1
3	vessel	2
4	vessel	2
5	whale	3
6	whale	3
7	whale	4
8	vessel	4
9	vessel	5
10	whale	6

Leaving me with the output of:

X	Target	ScanID
1	whale	1
2	vessel	1
3	whale	3
4	whale	3
5	whale	4
6	vessel	4
7	whale	6

CodePudding user response：

group_by is indeed necessary to consider each Scan ID, and filter is used to specify which rows to keep:

whales = read.table(text =
'X  Target  ScanID
1   whale   1
2   vessel  1
3   vessel  2
4   vessel  2
5   whale   3
6   whale   3
7   whale   4
8   vessel  4
9   vessel  5
10  whale   6', header = T)

library(dplyr)
whales %>%
  group_by(ScanID) %>%
  filter("whale" %in% Target)
# # A tibble: 7 × 3
# # Groups:   ScanID [4]
#       X Target ScanID
#   <int> <chr>   <int>
# 1     1 whale       1
# 2     2 vessel      1
# 3     5 whale       3
# 4     6 whale       3
# 5     7 whale       4
# 6     8 vessel      4
# 7    10 whale       6

CodePudding user response：

I think you can do this without group_by by extracting all the ScanID with rubbing.beach == "whale" and use it in subset.

subset(whales, ScanID %in% unique(ScanID[rubbing.beach == "whale"]))

#  rubbing.beach ScanID
#1         whale      1
#2        vessel      1
#3         whale      3
#4         whale      3
#5         whale      4
#6        vessel      4
#7         whale      6

In dplyr, we can use filter -

library(dplyr)

whales %>% filter(ScanID %in% unique(ScanID[rubbing.beach == "whale"]))