Home > database >  Sum by group and create a variable by year using R
Sum by group and create a variable by year using R

Time:10-19

I have a dataset with the name of companies, the city's code, the year when they received a license to install an antenna there, and the number of antennas installed in that city in that year.

Name     Cod   year_l count
OI     5300108  1999    18
OI     5300108  2000    1
OI     5300108  2001    1
OI     5300108  2006    2
OI     5300108  2007    1
OI     5300108  2008    1
OI     5300108  2011    5
OI     5300108  2017    8
OI     5300108  2020    1

I am trying to expand this dataset so I can have a panel by year with the total number of antennas that the company has in a given city.

   Name      Cod    year  antennas
    OI     5300108  1999    18
    OI     5300108  2000    19
    OI     5300108  2001    20
    OI     5300108  2002    20
    OI     5300108  2003    20
    OI     5300108  2004    20
    OI     5300108  2005    20
    OI     5300108  2006    22
    OI     5300108  2007    23
    OI     5300108  2008    24
    OI     5300108  2009    24
    OI     5300108  2010    24
    OI     5300108  2011    29
    OI     5300108  2012    29
    OI     5300108  2013    29
    OI     5300108  2014    29
    OI     5300108  2015    29
    OI     5300108  2016    29
    OI     5300108  2017    37
    OI     5300108  2018    37
    OI     5300108  2019    37
    OI     5300108  2020    38

I know that it would be something like this, but the problem is dealing with the year:

antenas %>% group_by(Name, Cod, year_l) %>% mutate(antennas = n_distinct(count))

CodePudding user response:

Using tidyr::complete and cumsum you could do:

Note: And thanks to the comment by @RuiBarradas we could use year_l = full_seq(year_l, 1) in complete instead of the clumsy year_l = seq(min(year_l, na.rm = TRUE), max(year_l, na.rm = TRUE)).

library(dplyr)
library(tidyr)

antenas %>%
  group_by(Name, Cod) %>%
  tidyr::complete(year_l = seq(min(year_l, na.rm = TRUE), max(year_l, na.rm = TRUE)), fill = list(count = 0)) %>%
  mutate(antennas = cumsum(count)) %>%
  ungroup()
#> # A tibble: 22 × 5
#>    Name      Cod year_l count antennas
#>    <chr>   <int>  <int> <int>    <int>
#>  1 OI    5300108   1999    18       18
#>  2 OI    5300108   2000     1       19
#>  3 OI    5300108   2001     1       20
#>  4 OI    5300108   2002     0       20
#>  5 OI    5300108   2003     0       20
#>  6 OI    5300108   2004     0       20
#>  7 OI    5300108   2005     0       20
#>  8 OI    5300108   2006     2       22
#>  9 OI    5300108   2007     1       23
#> 10 OI    5300108   2008     1       24
#> # … with 12 more rows

DATA

antenas <- structure(list(Name = c(
  "OI", "OI", "OI", "OI", "OI", "OI", "OI",
  "OI", "OI"
), Cod = c(
  5300108L, 5300108L, 5300108L, 5300108L,
  5300108L, 5300108L, 5300108L, 5300108L, 5300108L
), year_l = c(
  1999L,
  2000L, 2001L, 2006L, 2007L, 2008L, 2011L, 2017L, 2020L
), count = c(
  18L,
  1L, 1L, 2L, 1L, 1L, 5L, 8L, 1L
)), class = "data.frame", row.names = c(
  NA,
  -9L
))
  • Related