Suppose I have a dataset looks like below
Person Year From To
Peter 2001 Apple Microsoft
Peter 2006 Microsoft IBM
Peter 2010 IBM Facebook
Peter 2016 Facebook Apple
Kate 2003 Microsoft Google
Jimmy 2001 Samsung IBM
Jimmy 2004 IBM Google
Jimmy 2009 Google Facebook
I want to filter by person and only keep people who worked at IBM sometime (either in the From
or in the To
column). Furthermore, I only want to keep the records before people move away from IBM (that is, before "IBM" first appears in the From
column). Thus, I want something like below:
Person Year From To
Peter 2001 Apple Microsoft
Peter 2006 Microsoft IBM
Jimmy 2001 Samsung IBM
CodePudding user response:
A possible solution with dplyr
:
library(dplyr)
df %>%
group_by(Person) %>%
filter(To == "IBM" | lead(To) == "IBM") %>%
ungroup()
# A tibble: 3 x 4
Person Year From To
<chr> <int> <chr> <chr>
1 Peter 2001 Apple Microsoft
2 Peter 2006 Microsoft IBM
3 Jimmy 2001 Samsung IBM
Data
df <- structure(list(Person = c("Peter", "Peter", "Peter", "Peter",
"Kate", "Jimmy", "Jimmy", "Jimmy"), Year = c(2001L, 2006L, 2010L,
2016L, 2003L, 2001L, 2004L, 2009L), From = c("Apple", "Microsoft",
"IBM", "Facebook", "Microsoft", "Samsung", "IBM", "Google"),
To = c("Microsoft", "IBM", "Facebook", "Apple", "Google",
"IBM", "Google", "Facebook")), class = "data.frame", row.names = c(NA, -8L))