Home > Net >  Divide data frame by deciles and compute mean and median for each decile in each year R
Divide data frame by deciles and compute mean and median for each decile in each year R

Time:01-20

Given the following structure of data, I would like to divide the "Category"s in deciles for each year given by "value" and calculate for every year and every decile the mean and median for the variable "value".

df <- data.frame(Category = rep(LETTERS[1:15], each = 6),
                 Year = rep(2010:2015, 15),
                 value = runif(90, 50, 100))

I have tried this but then I have problem calculating the mean and mediam in each year for each decile (not sure this is 100% correct though).

a <- df %>%
  mutate(quantile = ntile(value, 10))

I would like to get a dataframe by year and deciles with the mean and mediam of each decile in each year

For dividing the df in deciles I have tried this which apparently gives me a result (not sure if 100% correct)

a <- df %>%
  mutate(quantile = ntile(value, 10))

For the calculation of the mean I have tried two methods and both fail:

using data.frame

w <- setDT(a)[, Mean := mean(value), by = .(Year, quantile)][]

and using dplry

z <- a %>% group_by(Year,quantile) %>% summarise (x = mean(value))

What I am looking for as a result is a data frame with one row per year and decile and their mean and mediam values.

That's how I need the output (with corresponding values in columns Mean and Median and deciles all the way to 10): enter image description here

CodePudding user response:

How about this:

library(dplyr)
df <- data.frame(Category = rep(LETTERS[1:15], each = 6),
                 Year = rep(2010:2015, 15),
                 value = runif(90, 50, 100))

df %>%
  group_by(Year) %>% 
  mutate(decile = ntile(value, 10)) %>% 
  group_by(Year, decile) %>% 
  summarise(mean = mean(value), 
            median = median(value)) %>% 
  arrange(decile, Year)
#> `summarise()` has grouped output by 'Year'. You can override using the
#> `.groups` argument.
#> # A tibble: 60 × 4
#> # Groups:   Year [6]
#>     Year decile  mean median
#>    <int>  <int> <dbl>  <dbl>
#>  1  2010      1  55.2   55.2
#>  2  2011      1  52.7   52.7
#>  3  2012      1  56.3   56.3
#>  4  2013      1  53.7   53.7
#>  5  2014      1  53.6   53.6
#>  6  2015      1  55.1   55.1
#>  7  2010      2  63.2   63.2
#>  8  2011      2  67.8   67.8
#>  9  2012      2  67.8   67.8
#> 10  2013      2  56.5   56.5
#> # … with 50 more rows

Created on 2023-01-19 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related