Mathematical operation of a variable at t with respect to t-1 in panel data (R)-CodePudding

I have a balanced panel data where the ID (cnpjcei) is recorded every year showing the total number of persons employed in the given firm. My goal is to account for the variation between employees in (t) and employees in (t-1) for all years in the database (in case, empreg(t) - empreg(t-1))

# A tibble: 386,763 x 3
     ano cnpjcei       empreg
   <dbl> <chr>          <dbl>
 1  2006 1000786001505     10
 2  2007 1000786001505     12
 3  2008 1000786001505     16
 4  2009 1000786001505     19
 5  2010 1000786001505      7
 6  2011 1000786001505      7
 7  2012 1000786001505      7
 8  2013 1000786001505      7
 9  2014 1000786001505      8
10  2015 1000786001505      9
# ... with 386,753 more rows

Something like this:


# A tibble: 386,763 x 4
     ano cnpjcei       empreg    variation_empreg
   <dbl> <chr>          <dbl>
 1  2006 1000786001505     10           
 2  2007 1000786001505     12            2
 3  2008 1000786001505     16            4
 4  2009 1000786001505     19            3
 5  2010 1000786001505      7           -12
 6  2011 1000786001505      7            0
 7  2012 1000786001505      7            0
 8  2013 1000786001505      7            0
 9  2014 1000786001505      8            1
10  2015 1000786001505      9            1
# ... with 386,753 more rows

Does anyone have any ideas? Thanks :)

CodePudding user response：

You can use diff:

library(dplyr)

df %>% mutate(variation_empreg = c(NA, diff(empreg)))
#>     ano       cnpjcei empreg variation_empreg
#> 1  2006 1000786001505     10               NA
#> 2  2007 1000786001505     12                2
#> 3  2008 1000786001505     16                4
#> 4  2009 1000786001505     19                3
#> 5  2010 1000786001505      7              -12
#> 6  2011 1000786001505      7                0
#> 7  2012 1000786001505      7                0
#> 8  2013 1000786001505      7                0
#> 9  2014 1000786001505      8                1
#> 10 2015 1000786001505      9                1

Data

df <- structure(list(ano = 2006:2015, cnpjcei = c("1000786001505", 
"1000786001505", "1000786001505", "1000786001505", "1000786001505", 
"1000786001505", "1000786001505", "1000786001505", "1000786001505", 
"1000786001505"), empreg = c(10L, 12L, 16L, 19L, 7L, 7L, 7L, 
7L, 8L, 9L)), row.names = c("1", "2", "3", "4", "5", "6", "7", 
"8", "9", "10"), class = "data.frame")