I want to see the attrition/growth level of groups' members by group in R.
My data:
year1 <-
tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))
year1 <-
tibble(people = c("Joe A", "Sam M", "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))
- Group 1 lost Max but gained Jane that moved from group 2.
- Group 2 lost Jane but gained Mohamad
Is there a way to see how many people joined/left a group in each year and the percentage change from year to year?
CodePudding user response:
Maybe there are easier options, but you could do:
year1 <- tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))
year2 <- tibble(people = c("Joe A", "Sam M", "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))
library(tidyverse)
map(.x = unique(year1$group),
.f = ~ year1 |>
filter(group == .x) |>
mutate(year = 1) |>
bind_rows(year2 |>
filter(group == .x) |>
mutate(year = 2)) |>
summarize(group = unique(group),
joined = length(setdiff(people[year == 2], people[year == 1])),
left = length(setdiff(people[year == 1], people[year == 2])),
n_year1 = sum(year == 1),
n_year2 = sum(year == 2),
pct_change = n_year1 / n_year2)) |>
bind_rows()
# A tibble: 2 × 6
group joined left n_year1 n_year2 pct_change
<dbl> <int> <int> <int> <int> <dbl>
1 1 1 1 3 3 1
2 2 3 1 2 4 0.5
CodePudding user response:
Changed your code a bit, based on some assumptions:
year1 <- tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2), year = 1)
year2 <- tibble(people = c("Joe A", "Sam M", "Jane K", "Doug K", "Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2), year = 2)
years = year1 %>% bind_rows(year2)
years %>% group_by(group, year) %>% summarise(n = n()) %>% group_by(group) %>% mutate(pct_change = n/lag(n) - 1)
I assumed your second dataframe represented another year, and then binded both into a single dataframe, with a year
column that identifies the year it represents.
Output:
group year n pct_change
<dbl> <dbl> <int> <dbl>
1 1 1 3 NA
2 1 2 3 0
3 2 1 2 NA
4 2 2 4 1
CodePudding user response:
library(tidyverse)
year1 <- tibble(people = c("Joe A", "Max X", "Sam M", "Jane K", "Doug K"), group = c(1, 1, 1, 2, 2))
year2 <-tibble(people = c("Joe A", "Sam M", "Jane K","Doug K","Mike K", "Jen G", "Mohamad T"), group = c(1, 1, 1, 2, 2, 2, 2))
year1 %>%
group_by(group) %>%
summarise(n = n()) %>%
full_join(year2 %>%
group_by(group) %>%
summarise(n = n()), by = "group") %>%
mutate(change = n.y - n.x, percent_change = change / n.x) %>%
ungroup() %>%
select(group, n.x, n.y, change, percent_change) %>% print()
output: (n.y= year2, n.x= year1)
# A tibble: 2 x 5
group n.x n.y change percent_change
<dbl> <int> <int> <int> <dbl>
1 1 3 3 0 0
2 2 2 4 2 1