Home > Mobile >  New variable from grouped calculation in R
New variable from grouped calculation in R

Time:04-15

I have a dataset:

library(dplyr)
my_df <- data.frame(day = c(1,1,1,2,2,2,3,3,3), age = c(18, 18, 18, 25, 18, 35, 76, 76, 15))
my_df
#   day age
# 1   1  18
# 2   1  18
# 3   1  18
# 4   2  25
# 5   2  18
# 6   2  35
# 7   3  76
# 8   3  76
# 9   3  15

For each row, I want to know the frequency and percentage of age for a given value of day. For example, I can calculate this with a dplyr chain:

my_df %>%
  group_by(day, age) %>%
  summarize(n=n()) %>%
  group_by(day) %>%
  mutate(pct = n/sum(n))
#     day   age    n   pct
# 1     1    18    3   1    
# 2     2    18    1   0.333
# 3     2    25    1   0.333
# 4     2    35    1   0.333
# 5     3    15    1   0.333
# 6     3    76    2   0.667

How can I add the vales of n values back onto my original df? Desired output:

#   day age  n
# 1   1  18  3
# 2   1  18  3
# 3   1  18  3
# 4   2  25  1
# 5   2  18  1
# 6   2  35  1
# 7   3  76  2
# 8   3  76  2
# 9   3  15  1

CodePudding user response:

For your desired output we could use add_count()

library(dplyr)
my_df %>% 
  add_count(day, age)
  day age n
1   1  18 3
2   1  18 3
3   1  18 3
4   2  25 1
5   2  18 1
6   2  35 1
7   3  76 2
8   3  76 2
9   3  15 1

CodePudding user response:

I would store this as a variable, as such:

my_helper_df <- my_df %>%
  group_by(day, age) %>%
  summarize(n=n()) %>%
  group_by(day) %>%
  mutate(pct = n/sum(n))

Then left_join to the original df, as so:

final_df <- dplyr::left_join(df, my_helper_df, by = c("day", "age"))
  • Related