I have a dataframe looks like below (the true data has many more people):
Year Person Office
2005 Lauren Boston
2007 Lauren Boston
2008 Lauren Chicago
2010 Lauren Boston
2012 Lauren New York
2013 Lauren New York
2007 Kate Atlanta
2009 Kate Atlanta
I want to calculate a office-person level measure (count) that captures the weighted number of offices that a person experienced before coming to the current one. The formula is (length of the previous experience 1/total previous years before current position)^2 (length of the previous experience 2/total previous years before current position)^2 ......
Below is the ideal output. For Lauren, Boston is her first office. Thus, her count for Boston is 0. For Lauren, Chicago is her second office and she spent 2008-2005=3 years before coming to the Chicago office. Thus, her count for Chicago is (3/3)^2=1. Then, Lauren came back to Boston for the second time. Here, she had two previous experience (Boston for 3 years and Chicago for 2 years (2010-2008)). Thus, her count for this second Boston experience is (3/5)^2 (2/5)^2=0.52.
Office Person Count
Boston Lauren 0
Chicago Lauren 1
Boston Lauren 0.52
New York Lauren 0.3469
Atlanta Kate 0
CodePudding user response:
You could use a similar approach like in this question:
library(dplyr)
df %>%
group_by(Person, Office, grp = cumsum(coalesce(Office != lag(Office), TRUE))) %>%
slice(1) %>%
arrange(Year) %>%
group_by(Person) %>%
mutate(
tmp_cnt = if_else(
row_number() == 1,
0,
(Year - lag(Year))^2
),
Count = if_else((Year - first(Year))^2 == 0, 0, cumsum(tmp_cnt) / (Year - first(Year))^2)
) %>%
select(-grp, -tmp_cnt) %>%
ungroup()
This returns
# A tibble: 5 x 4
Year Person Office Count
<dbl> <chr> <chr> <dbl>
1 2005 Lauren Boston 0
2 2007 Kate Atlanta 0
3 2008 Lauren Chicago 1
4 2010 Lauren Boston 0.52
5 2012 Lauren New_York 0.347