I have a dataframe looks like below:
Year Person Office
2005 Peter Boston
2007 Peter Boston
2008 Peter Chicago
2009 Peter New York
2011 Peter New York
2003 Amy Seattle
2004 Amy Boston
2006 Amy Chicago
2007 Amy Chicago
I want to calculate a office-person level normalized measure (count) that captures the number of offices that a person experienced before coming to the current one. The measure is normalized by the total amount of years before arriving at the current position. Below is the ideal output. For Peter, Boston is his first office and thus, his normalized measure count for Boston is 0. For Peter, Chicago is his second office and he spent 2008-2005=3 years before coming to the Chicago office. Thus, his normalized measure count for Chicago is 1/3.
Office Person Count
Boston Peter 0
Boston Amy 1
Chicago Peter 1/3
Chicago Amy 2/3
New York Peter 1/2
Seattle Amy 0
CodePudding user response:
You could use
library(dplyr)
df %>%
group_by(Person, Office) %>%
slice_min(Year) %>%
arrange(Year) %>%
add_count() %>%
group_by(Person) %>%
mutate(Count = if_else(cumsum(n) == 1, 0, (cumsum(n) - 1) / (Year - first(Year))),
.keep = "unused") %>%
ungroup()
This returns
# A tibble: 6 x 3
Person Office Count
<chr> <chr> <dbl>
1 Amy Seattle 0
2 Amy Boston 1
3 Peter Boston 0
4 Amy Chicago 0.667
5 Peter Chicago 0.333
6 Peter New_York 0.5
CodePudding user response:
library(tidyverse)
cities %>%
group_by(Person, Office) %>%
filter(row_number() == 1) %>%
group_by(Person) %>%
mutate(x = row_number()-1, y = (Year - Year[1])) %>%
mutate(count = ifelse(is.nan(x / y), x, x/y))
# Year Person Office x y test
# <int> <chr> <chr> <dbl> <int> <dbl>
# 1 2005 Peter "Boston" 0 0 0
# 2 2008 Peter "Chicago" 1 3 0.333
# 3 2009 Peter "New York" 2 4 0.5
# 4 2003 Amy "Seattle " 0 0 0
# 5 2004 Amy "Boston" 1 1 1
# 6 2006 Amy "Chicago" 2 3 0.667