I have a dataset with the name of companies, the city's code, the year when they received a license to install an antenna there, and the number of antennas installed in that city in that year.
Name Cod year_l count
OI 5300108 1999 18
OI 5300108 2000 1
OI 5300108 2001 1
OI 5300108 2006 2
OI 5300108 2007 1
OI 5300108 2008 1
OI 5300108 2011 5
OI 5300108 2017 8
OI 5300108 2020 1
I am trying to expand this dataset so I can have a panel by year with the total number of antennas that the company has in a given city.
Name Cod year antennas
OI 5300108 1999 18
OI 5300108 2000 19
OI 5300108 2001 20
OI 5300108 2002 20
OI 5300108 2003 20
OI 5300108 2004 20
OI 5300108 2005 20
OI 5300108 2006 22
OI 5300108 2007 23
OI 5300108 2008 24
OI 5300108 2009 24
OI 5300108 2010 24
OI 5300108 2011 29
OI 5300108 2012 29
OI 5300108 2013 29
OI 5300108 2014 29
OI 5300108 2015 29
OI 5300108 2016 29
OI 5300108 2017 37
OI 5300108 2018 37
OI 5300108 2019 37
OI 5300108 2020 38
I know that it would be something like this, but the problem is dealing with the year:
antenas %>% group_by(Name, Cod, year_l) %>% mutate(antennas = n_distinct(count))
CodePudding user response:
Using tidyr::complete
and cumsum
you could do:
Note: And thanks to the comment by @RuiBarradas we could use year_l = full_seq(year_l, 1)
in complete
instead of the clumsy year_l = seq(min(year_l, na.rm = TRUE), max(year_l, na.rm = TRUE))
.
library(dplyr)
library(tidyr)
antenas %>%
group_by(Name, Cod) %>%
tidyr::complete(year_l = seq(min(year_l, na.rm = TRUE), max(year_l, na.rm = TRUE)), fill = list(count = 0)) %>%
mutate(antennas = cumsum(count)) %>%
ungroup()
#> # A tibble: 22 × 5
#> Name Cod year_l count antennas
#> <chr> <int> <int> <int> <int>
#> 1 OI 5300108 1999 18 18
#> 2 OI 5300108 2000 1 19
#> 3 OI 5300108 2001 1 20
#> 4 OI 5300108 2002 0 20
#> 5 OI 5300108 2003 0 20
#> 6 OI 5300108 2004 0 20
#> 7 OI 5300108 2005 0 20
#> 8 OI 5300108 2006 2 22
#> 9 OI 5300108 2007 1 23
#> 10 OI 5300108 2008 1 24
#> # … with 12 more rows
DATA
antenas <- structure(list(Name = c(
"OI", "OI", "OI", "OI", "OI", "OI", "OI",
"OI", "OI"
), Cod = c(
5300108L, 5300108L, 5300108L, 5300108L,
5300108L, 5300108L, 5300108L, 5300108L, 5300108L
), year_l = c(
1999L,
2000L, 2001L, 2006L, 2007L, 2008L, 2011L, 2017L, 2020L
), count = c(
18L,
1L, 1L, 2L, 1L, 1L, 5L, 8L, 1L
)), class = "data.frame", row.names = c(
NA,
-9L
))