I have this dataset in R:
name = c("john", "john", "john", "john", "john", "alex", "alex", "alex", "alex", "alex" )
year = c(2010, 2011, 2012, 2015, 2017, 2014, 2015, 2016, 2017, 2018)
my_data = data.frame(name, year)
> my_data
name year
1 john 2010
2 john 2011
3 john 2012
4 john 2015
5 john 2017
6 alex 2014
7 alex 2015
8 alex 2016
9 alex 2017
10 alex 2018
For each person in this dataset - I want to select all rows until the first non-consecutive year appears.
As an example, the final output would look something like this:
# DESIRED OUTPUT
> my_data
name year
1 john 2010
2 john 2011
3 john 2012
6 alex 2014
7 alex 2015
8 alex 2016
9 alex 2017
10 alex 2018
To do this, I thought of the following approach:
> agg <- aggregate(year ~ name, my_data, c)
> agg
name year.1 year.2 year.3 year.4 year.5
1 alex 2014 2015 2016 2017 2018
2 john 2010 2011 2012 2015 2017
library(stringr)
agg = data.frame(as.matrix(agg))
agg$all_years = paste(agg$year.1 ,agg$year.2, agg$year.3, agg$year.4, agg$year.5)
agg$y_2010 = str_detect(agg$all_years, "2010")
agg$y_2011 = str_detect(agg$all_years, "2011")
agg$y_2012 = str_detect(agg$all_years, "2012")
agg$y_2013 = str_detect(agg$all_years, "2013")
agg$y_2014 = str_detect(agg$all_years, "2014")
agg$y_2015 = str_detect(agg$all_years, "2015")
agg$y_2016 = str_detect(agg$all_years, "2016")
agg$y_2017 = str_detect(agg$all_years, "2017")
agg$y_2018 = str_detect(agg$all_years, "2018")
agg$y_2019 = str_detect(agg$all_years, "2019")
name year.1 year.2 year.3 year.4 year.5 all_years y_2010 y_2011 y_2012 y_2013 y_2014 y_2015 y_2016 y_2017 y_2018 y_2019
1 alex 2014 2015 2016 2017 2018 2014 2015 2016 2017 2018 FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
2 john 2010 2011 2012 2015 2017 2010 2011 2012 2015 2017 TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
Now, the idea would be for each row - find out the first time when a "TRUE" is followed by a "FALSE" - and then I would try to find some way to accomplish my task.
But I am not sure how to proceed from here.
Can someone please show me how to do this?
Thanks!
CodePudding user response:
library(dplyr)
my_data %>%
group_by(name) %>%
filter(c(1,diff(year)) == 1)
# A tibble: 8 x 2
# Groups: name [2]
name year
<chr> <dbl>
1 john 2010
2 john 2011
3 john 2012
4 alex 2014
5 alex 2015
6 alex 2016
7 alex 2017
8 alex 2018