Home > Software engineering >  Yearly percent change of group members in r
Yearly percent change of group members in r

Time:09-22

I want to see the attrition/growth level of groups' members by group in R.

My data:

year1 <- 
  tibble(people = c("Joe A", "Max X", "Sam M",  "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))

year1 <- 
  tibble(people = c("Joe A", "Sam M",  "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))
  • Group 1 lost Max but gained Jane that moved from group 2.
  • Group 2 lost Jane but gained Mohamad

Is there a way to see how many people joined/left a group in each year and the percentage change from year to year?

CodePudding user response:

Maybe there are easier options, but you could do:

year1 <- tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))

year2 <- tibble(people = c("Joe A", "Sam M", "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))

library(tidyverse)    
map(.x = unique(year1$group),
    .f = ~ year1 |> 
      filter(group == .x) |> 
      mutate(year = 1) |> 
      bind_rows(year2 |> 
                  filter(group == .x) |> 
                  mutate(year = 2)) |> 
      summarize(group = unique(group),
                joined     = length(setdiff(people[year == 2], people[year == 1])),
                left       = length(setdiff(people[year == 1], people[year == 2])),
                n_year1    = sum(year == 1),
                n_year2    = sum(year == 2),
                pct_change = n_year1 / n_year2)) |> 
  bind_rows()

# A tibble: 2 × 6
  group joined  left n_year1 n_year2 pct_change
  <dbl>  <int> <int>   <int>   <int>      <dbl>
1     1      1     1       3       3        1  
2     2      3     1       2       4        0.5

CodePudding user response:

Changed your code a bit, based on some assumptions:

year1 <- tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2), year = 1)

year2 <- tibble(people = c("Joe A", "Sam M", "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2), year = 2)
years =  year1 %>% bind_rows(year2)

years %>% group_by(group, year) %>% summarise(n = n()) %>% group_by(group) %>% mutate(pct_change = n/lag(n) - 1)

I assumed your second dataframe represented another year, and then binded both into a single dataframe, with a year column that identifies the year it represents.

Output:

  group  year     n   pct_change
  <dbl> <dbl> <int> <dbl>
1     1     1     3    NA
2     1     2     3     0
3     2     1     2    NA
4     2     2     4     1

CodePudding user response:

library(tidyverse)

year1 <- tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))
year2 <-tibble(people = c("Joe A", "Sam M",  "Jane K","Doug K","Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))
year1 %>%
  group_by(group) %>%
  summarise(n = n()) %>%
  full_join(year2 %>%
              group_by(group) %>%
              summarise(n = n()), by = "group") %>%
  mutate(change = n.y - n.x, percent_change = change / n.x) %>%
  ungroup() %>%
  select(group, n.x, n.y, change, percent_change) %>% print()

output: (n.y= year2, n.x= year1)

# A tibble: 2 x 5
  group   n.x   n.y change percent_change
  <dbl> <int> <int>  <int>          <dbl>
1     1     3     3      0              0
2     2     2     4      2              1
  • Related