Home > Net >  R Dataframe Regroup and Summarize
R Dataframe Regroup and Summarize

Time:04-28

I have a dataframe looks like below:

Year   Person   Office
2005   Peter    Boston 
2007   Peter    Boston
2008   Peter    Chicago 
2009   Peter    New York
2011   Peter    New York 
2003   Amy      Seattle 
2004   Amy      Boston 
2006   Amy      Chicago 
2007   Amy      Chicago

I want to calculate a office-person level normalized measure (count) that captures the number of offices that a person experienced before coming to the current one. The measure is normalized by the total amount of years before arriving at the current position. Below is the ideal output. For Peter, Boston is his first office and thus, his normalized measure count for Boston is 0. For Peter, Chicago is his second office and he spent 2008-2005=3 years before coming to the Chicago office. Thus, his normalized measure count for Chicago is 1/3.

Office    Person  Count
Boston    Peter   0
Boston    Amy     1
Chicago   Peter   1/3
Chicago   Amy     2/3
New York  Peter   1/2
Seattle   Amy     0

CodePudding user response:

You could use

library(dplyr)

df %>% 
  group_by(Person, Office) %>% 
  slice_min(Year) %>% 
  arrange(Year) %>% 
  add_count() %>% 
  group_by(Person) %>% 
  mutate(Count = if_else(cumsum(n) == 1, 0, (cumsum(n) - 1) / (Year - first(Year))),
         .keep = "unused") %>% 
  ungroup()

This returns

# A tibble: 6 x 3
  Person Office   Count
  <chr>  <chr>    <dbl>
1 Amy    Seattle  0    
2 Amy    Boston   1    
3 Peter  Boston   0    
4 Amy    Chicago  0.667
5 Peter  Chicago  0.333
6 Peter  New_York 0.5 

CodePudding user response:

library(tidyverse)

cities %>% 
  group_by(Person, Office) %>%
  filter(row_number() == 1) %>% 
  group_by(Person) %>% 
  mutate(x = row_number()-1, y = (Year - Year[1])) %>% 
  mutate(count = ifelse(is.nan(x / y), x, x/y))

#   Year Person Office         x     y  test
#  <int> <chr>  <chr>      <dbl> <int> <dbl>
# 1  2005 Peter  "Boston"       0     0 0    
# 2  2008 Peter  "Chicago"      1     3 0.333
# 3  2009 Peter  "New York"     2     4 0.5  
# 4  2003 Amy    "Seattle "     0     0 0    
# 5  2004 Amy    "Boston"       1     1 1    
# 6  2006 Amy    "Chicago"      2     3 0.667
  • Related