Home > front end >  R Dataframe By-Value Filter
R Dataframe By-Value Filter

Time:04-22

Suppose I have a dataset looks like below

  Person Year   From       To
  Peter  2001   Apple      Microsoft
  Peter  2006   Microsoft  IBM
  Peter  2010   IBM        Facebook
  Peter  2016   Facebook   Apple
  Kate   2003   Microsoft  Google
  Jimmy  2001   Samsung    IBM
  Jimmy  2004   IBM        Google 
  Jimmy  2009   Google     Facebook

I want to filter by person and only keep people who worked at IBM sometime (either in the From or in the To column). Furthermore, I only want to keep the records before people move away from IBM (that is, before "IBM" first appears in the From column). Thus, I want something like below:

  Person Year   From       To
  Peter  2001   Apple      Microsoft
  Peter  2006   Microsoft  IBM
  Jimmy  2001   Samsung    IBM

CodePudding user response:

A possible solution with dplyr:

library(dplyr)

df %>%
  group_by(Person) %>%
  filter(To == "IBM" | lead(To) == "IBM") %>%
  ungroup()

# A tibble: 3 x 4
  Person  Year From      To
  <chr>  <int> <chr>     <chr>
1 Peter   2001 Apple     Microsoft
2 Peter   2006 Microsoft IBM
3 Jimmy   2001 Samsung   IBM

Data

df <- structure(list(Person = c("Peter", "Peter", "Peter", "Peter", 
"Kate", "Jimmy", "Jimmy", "Jimmy"), Year = c(2001L, 2006L, 2010L,
2016L, 2003L, 2001L, 2004L, 2009L), From = c("Apple", "Microsoft",
"IBM", "Facebook", "Microsoft", "Samsung", "IBM", "Google"),
To = c("Microsoft", "IBM", "Facebook", "Apple", "Google",
"IBM", "Google", "Facebook")), class = "data.frame", row.names = c(NA, -8L))
  • Related