Get mean values for sub divisions of grouped data using R-CodePudding

I want to get the mean value (amount) of the first 20%, second 20%, third 20% etc of each group within a data frame.

I have started a reprex below but don't know how to finish it. I think I will need to calculate the start and end of each 20% using a row number, which I've added as the ID is not sequential or predictable.

The actual data will have more than 2 groups and there will be more than one value to mean.

library(tidyverse)
df <- data.frame(id = c(0, 12, 23, 34, 45, 56, 67, 78, 89, 90, 9, 98, 87, 76, 65, 54, 43, 32, 21, 99),
                 group = c(1, 1, 2, 2, 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2),
                 amount = c(102, 312, 567, 432, 345, 987, 94, 34, 54, 32, 92, 67, 32, 49, 36, 31, 19, 18, 17, 308))

groupSplitMeans <- df %>%
  mutate(rowNumber = row_number()) %>%
  group_by(group)

CodePudding user response：

You could use ntile() to break the data into n groups.

df %>%
  group_by(group) %>%
  mutate(nth = ntile(n = 5)) %>%
  group_by(group, nth) %>%
  summarise(amount = mean(amount)) %>%
  ungroup()

# # A tibble: 10 × 3
#    group   nth amount
#    <dbl> <int>  <dbl>
#  1     1     1  207  
#  2     1     2  540. 
#  3     1     3   32  
#  4     1     4   34  
#  5     1     5   17.5
#  6     2     1  500. 
#  7     2     2  190. 
#  8     2     3   73  
#  9     2     4   51.5
# 10     2     5  170.

The code above splits the data according to the row numbers, regardless of how much amount is. If amount need to be sorted before split, insert it into ntile:

df %>%
  ...
  mutate(nth = ntile(amount, n = 5)) %>%
  ...

# # A tibble: 10 × 3
#    group   nth amount
#    <dbl> <int>  <dbl>
#  1     1     1   17.5
#  2     1     2   25.5
#  3     1     3   40.5
#  4     1     4   98  
#  5     1     5  650. 
#  6     2     1   32.5
#  7     2     2   45  
#  8     2     3   79.5
#  9     2     4  326. 
# 10     2     5  500.

Now you could see that the means of amount are increasing.