Home > Net >  R Dataframe Weighted Measure by Group
R Dataframe Weighted Measure by Group

Time:04-29

I have a dataframe looks like below (the true data has many more people):

Year   Person    Office
2005   Lauren    Boston 
2007   Lauren    Boston
2008   Lauren    Chicago 
2010   Lauren    Boston 
2012   Lauren    New York
2013   Lauren    New York 
2007   Kate      Atlanta
2009   Kate      Atlanta

I want to calculate a office-person level measure (count) that captures the weighted number of offices that a person experienced before coming to the current one. The formula is (length of the previous experience 1/total previous years before current position)^2 (length of the previous experience 2/total previous years before current position)^2 ......

Below is the ideal output. For Lauren, Boston is her first office. Thus, her count for Boston is 0. For Lauren, Chicago is her second office and she spent 2008-2005=3 years before coming to the Chicago office. Thus, her count for Chicago is (3/3)^2=1. Then, Lauren came back to Boston for the second time. Here, she had two previous experience (Boston for 3 years and Chicago for 2 years (2010-2008)). Thus, her count for this second Boston experience is (3/5)^2 (2/5)^2=0.52.

Office    Person  Count
Boston    Lauren  0
Chicago   Lauren  1
Boston    Lauren  0.52
New York  Lauren  0.3469
Atlanta   Kate    0

CodePudding user response:

You could use a similar approach like in this question:

library(dplyr)

df %>% 
  group_by(Person, Office, grp = cumsum(coalesce(Office != lag(Office), TRUE))) %>% 
  slice(1) %>% 
  arrange(Year) %>% 
  group_by(Person) %>% 
  mutate(
    tmp_cnt = if_else(
      row_number() == 1, 
      0, 
      (Year - lag(Year))^2 
      ),
    Count = if_else((Year - first(Year))^2 == 0, 0, cumsum(tmp_cnt) / (Year - first(Year))^2)
    ) %>% 
  select(-grp, -tmp_cnt) %>% 
  ungroup()

This returns

# A tibble: 5 x 4
   Year Person Office   Count
  <dbl> <chr>  <chr>    <dbl>
1  2005 Lauren Boston   0    
2  2007 Kate   Atlanta  0    
3  2008 Lauren Chicago  1    
4  2010 Lauren Boston   0.52 
5  2012 Lauren New_York 0.347
  • Related