I am trying to group a table after three different things at once. I have problems with the last step. The first ten rows of my dataset (band_list) looks like the following:
BandName Year BandType
1 Agamemnon 2020 0
2 Ajax 2010 0
3 Ajax 2011 0
4 Ajax 2012 0
5 Ajax 2013 0
6 Ajax 2014 0
7 Ajax-Egerton 2016 1
8 Aldo 2016 0
9 Aldo 2017 0
10 Aldo-Knottboy 2018 1
First, I want to group my dataset by name and band type. For this, I know that I can use the following code.
df <- band_list %>% group_by(`BandName(Stallion-Tag)`, BandType)
Second, I want to group by year. Meaning, that I want to group the band with the same name and band type as well as consecutive years as one group and count the total amount of occurences. For example, Ajax (rows 2-6) occurs five times with the same name and type and also within consecutive years (2010-2014), so I want to get the result of 5.
occurence_band <- band_list %>% count(`BandName(Stallion-Tag)`, BandType)
I do get this with my code, but I have some data where a year is missing in between but the name and type stays the same. For example,
BandName Year BandType
40 Arno 2014 0
41 Arno 2015 0
42 Arno 2017 0
43 Arno 2018 0
44 Arno 2020 0
Using the same codes above R would give me the count of 5 for Arno but I want to have three different groups for Arno here: Arno (2014-2015) with count two, Arno (2017-2018) also with count two and Arno (2020) with count one. I need it seperated like this because it is important for further analysis I have to do. I thought that writing my own function might solve my problem but I only wrote my own function once, so I have now ideas how to tell R to look for consequtive years and count it (and I did not found anything that worked). Any help would be really appreciated.
CodePudding user response:
Create a Runs
variable with a logical cumsum
trick and include this variable in the grouping variables. Then count the subgroups.
df1 <- read.table(text = "
BandName Year BandType
40 Arno 2014 0
41 Arno 2015 0
42 Arno 2017 0
43 Arno 2018 0
44 Arno 2020 0
", header = TRUE)
suppressPackageStartupMessages(library(dplyr))
df1 %>%
mutate(Runs = cumsum(c(1L, diff(Year)) > 1L)) %>%
group_by(BandName, BandType, Runs) %>%
summarize(Count = n(), .groups = "drop") %>%
select(-Runs)
#> # A tibble: 3 x 3
#> BandName BandType Count
#> <chr> <int> <int>
#> 1 Arno 0 2
#> 2 Arno 0 2
#> 3 Arno 0 1
Created on 2022-03-13 by the reprex package (v2.0.1)
This also works with df2
below.
df2 <- read.table(text = "
BandName Year BandType
1 Agamemnon 2020 0
2 Ajax 2010 0
3 Ajax 2011 0
4 Ajax 2012 0
5 Ajax 2013 0
6 Ajax 2014 0
7 Ajax-Egerton 2016 1
8 Aldo 2016 0
9 Aldo 2017 0
10 Aldo-Knottboy 2018 1
", header = TRUE)
df2 %>%
mutate(Runs = cumsum(c(1L, diff(Year)) > 1L)) %>%
group_by(BandName, BandType, Runs) %>%
summarize(Count = n(), .groups = "drop") %>%
select(-Runs)
#> # A tibble: 5 x 3
#> BandName BandType Count
#> <chr> <int> <int>
#> 1 Agamemnon 0 1
#> 2 Ajax 0 5
#> 3 Ajax-Egerton 1 1
#> 4 Aldo 0 2
#> 5 Aldo-Knottboy 1 1
Created on 2022-03-13 by the reprex package (v2.0.1)
CodePudding user response:
Here using the data.table package:
library(data.table)
dat <- as.data.table(test)
result <- dat[, .SD[, .(count= .N), Year], .(BandName, BandType)]