I have a balanced panel data where the ID (cnpjcei) is recorded every year showing the total number of persons employed in the given firm. My goal is to account for the variation between employees in (t) and employees in (t-1) for all years in the database (in case, empreg(t) - empreg(t-1))
# A tibble: 386,763 x 3
ano cnpjcei empreg
<dbl> <chr> <dbl>
1 2006 1000786001505 10
2 2007 1000786001505 12
3 2008 1000786001505 16
4 2009 1000786001505 19
5 2010 1000786001505 7
6 2011 1000786001505 7
7 2012 1000786001505 7
8 2013 1000786001505 7
9 2014 1000786001505 8
10 2015 1000786001505 9
# ... with 386,753 more rows
Something like this:
# A tibble: 386,763 x 4
ano cnpjcei empreg variation_empreg
<dbl> <chr> <dbl>
1 2006 1000786001505 10
2 2007 1000786001505 12 2
3 2008 1000786001505 16 4
4 2009 1000786001505 19 3
5 2010 1000786001505 7 -12
6 2011 1000786001505 7 0
7 2012 1000786001505 7 0
8 2013 1000786001505 7 0
9 2014 1000786001505 8 1
10 2015 1000786001505 9 1
# ... with 386,753 more rows
Does anyone have any ideas? Thanks :)
CodePudding user response:
You can use diff
:
library(dplyr)
df %>% mutate(variation_empreg = c(NA, diff(empreg)))
#> ano cnpjcei empreg variation_empreg
#> 1 2006 1000786001505 10 NA
#> 2 2007 1000786001505 12 2
#> 3 2008 1000786001505 16 4
#> 4 2009 1000786001505 19 3
#> 5 2010 1000786001505 7 -12
#> 6 2011 1000786001505 7 0
#> 7 2012 1000786001505 7 0
#> 8 2013 1000786001505 7 0
#> 9 2014 1000786001505 8 1
#> 10 2015 1000786001505 9 1
Data
df <- structure(list(ano = 2006:2015, cnpjcei = c("1000786001505",
"1000786001505", "1000786001505", "1000786001505", "1000786001505",
"1000786001505", "1000786001505", "1000786001505", "1000786001505",
"1000786001505"), empreg = c(10L, 12L, 16L, 19L, 7L, 7L, 7L,
7L, 8L, 9L)), row.names = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10"), class = "data.frame")