Say I have the following data
:
date | name | rolename | firmname |
---|---|---|---|
2011-12-01 | John | helper | A |
2012-12-01 | John | helper | A |
2013-12-01 | John | helper | A |
2014-12-01 | John | helper | B |
2014-12-01 | John | senior manager | C |
2015-12-01 | John | helper | B |
2015-12-01 | John | senior manager | C |
2016-12-01 | John | senior manager | C |
2016-12-01 | John | senior manager | D |
2017-12-01 | John | helper | E |
2011-12-01 | Will | senior manager | A |
2012-12-01 | Will | senior manager | A |
2013-12-01 | Will | senior manager | Z |
I am trying to create a dummy variable for prior senior manager experience (dummy_sm_exp
). That is, dummy_sm_exp
equals 1 when the person has prior experience as a senior manager in some other firm, 0 otherwise. For instance, for the above data, a fifth column is created taking following values:
date | name | rolename | firmname | dummy_sm_exp |
---|---|---|---|---|
2011-12-01 | John | helper | A | 0 |
2012-12-01 | John | helper | A | 0 |
2013-12-01 | John | helper | A | 0 |
2014-12-01 | John | helper | B | 0 |
2014-12-01 | John | senior manager | C | 0 |
2015-12-01 | John | helper | B | 0 |
2015-12-01 | John | senior manager | C | 0 |
2016-12-01 | John | senior manager | C | 0 |
2016-12-01 | John | senior manager | D | 1 |
2017-12-01 | John | helper | E | 1 |
2011-12-01 | Will | senior manager | A | 0 |
2012-12-01 | Will | senior manager | A | 0 |
2013-12-01 | Will | senior manager | Z | 1 |
Note that the dummy becomes equal to 1 only when a person has prior senior manager work experience at another firm. Any hints? Thanks.
CodePudding user response:
I think this is the most robust way of testing whether each person has ever had management experience at a different company, i.e. by summarising by person and company before testing lagged experience:
library(tidyverse)
df <- read.table(
text = "date name rolename firmname
2011-12-01 John helper A
2012-12-01 John helper A
2013-12-01 John helper A
2014-12-01 John helper B
2014-12-01 John 'senior manager' C
2015-12-01 John helper B
2015-12-01 John 'senior manager' C
2016-12-01 John 'senior manager' C
2016-12-01 John 'senior manager' D
2017-12-01 John helper E
2011-12-01 Will 'senior manager' A
2012-12-01 Will 'senior manager' A
2013-12-01 Will 'senior manager' Z",
header = TRUE
) |>
as_tibble()
df |>
group_by(name, firmname) |>
summarise(manager = any(rolename == "senior manager")) |>
mutate(prev_man = as.integer(lag(cumany(manager), default = FALSE))) |>
left_join(df, by = c("name", "firmname")) |>
select(date, name, rolename, firmname, prev_man)
#> `summarise()` has grouped output by 'name'. You can override using the
#> `.groups` argument.
#> # A tibble: 13 × 5
#> # Groups: name [2]
#> date name rolename firmname prev_man
#> <chr> <chr> <chr> <chr> <int>
#> 1 2011-12-01 John helper A 0
#> 2 2012-12-01 John helper A 0
#> 3 2013-12-01 John helper A 0
#> 4 2014-12-01 John helper B 0
#> 5 2015-12-01 John helper B 0
#> 6 2014-12-01 John senior manager C 0
#> 7 2015-12-01 John senior manager C 0
#> 8 2016-12-01 John senior manager C 0
#> 9 2016-12-01 John senior manager D 1
#> 10 2017-12-01 John helper E 1
#> 11 2011-12-01 Will senior manager A 0
#> 12 2012-12-01 Will senior manager A 0
#> 13 2013-12-01 Will senior manager Z 1
This will not distinguish between different senior manager roles at the same company. I'm not sure what John is doing in 2014 and 2015 - is he in both a helper and manager role in both years, so should get 0 in 2015 and 2016 as he hasn't changed company, or do you mean by this that he's switched between roles and should get 1? Will has not changed company in 2012 so gets a 0, but gets a 1 for 2013.