Home > Mobile >  Calculate the quantile of multiple Groups in same Dataframe in R
Calculate the quantile of multiple Groups in same Dataframe in R

Time:10-22

I have a dataframe in R with the following format :

ClientID     Group   CountC   
  X1           A       3
  R3           B       2
  D4           A       1
  T5           A       7
  H0           B       5 

I want to calculate the quartiles of CountC and store the values in another dataframe, I did the following code:

calculate_quantile <- function(data, proba) { 
  z <- quantile(data, proba)
  df <- data.frame(Proba = proba, Value = z)
  return(df)
}

proba = c(0.25, 0.50, 0.75)

quartileDF <- calculate_quantile(df$CountC, proba)

Result in quartileDF is in the following format:

        Proba    Value
25%      0.25      1
50%      0.50      2
75%      0.75      7 

What I need to do now is calculate the same thing but by group.

I tried the following function:

q = c(.25, .5, .75)

quartileDFbyGroup <- 
  df%>%
    group_by(Group) %>%
      summarize(quant25 = quantile(df$CountC, probs = q[1]), 
              quant50 = quantile(df$CountC, probs = q[2]),
              quant75 = quantile(df$CountC, probs = q[3]))

But it gives me the same values for both groups:

Group     quant25  quant50   quant75
  A         1         2        7
  B         1         2        7

I tried to do it manually and in the tedious way, to be sure:

df_A = df[df['Group']=='A',]
df_B = df[df['Group']=='B',]
proba = c(0.25, 0.50, 0.75)
    
quartileDF_A <- calculate_quantile(df_A$CountC, proba)
quartileDF_B <- calculate_quantile(df_B$CountC, proba)

The values I get in each Group dataframe are not the same.

Not sure what I am doing wrong.

Can you help me calculate the quantile dynamically of 2 groups in the same dataframe?

CodePudding user response:

You can mention percentiles in the functions itself. Will this work:

df %>% group_by(Group) %>% summarise(Quantile25 = quantile(CountC, probs = .25),
                                     Quantile50 = quantile(CountC, probs = .5),
                                     Quantile75 = quantile(CountC, probs = .75))
# A tibble: 2 x 4
  Group Quantile25 Quantile50 Quantile75
  <chr>      <dbl>      <dbl>      <dbl>
1 A           2           3         5   
2 B           2.75        3.5       4.25

CodePudding user response:

We may do this with unnest

library(dplyr)
library(tidyr)
df %>%
    group_by(Group) %>% 
    summarise(out = list(as_tibble(as.list(quantile(CountC, probs = c(0.25, 
         0.5, 0.75)))))) %>%
    unnest(out)

-output

# A tibble: 2 × 4
  Group `25%` `50%` `75%`
  <chr> <dbl> <dbl> <dbl>
1 A      2      3    5   
2 B      2.75   3.5  4.25

data

df <- structure(list(ClientID = c("X1", "R3", "D4", "T5", "H0"), Group = c("A", 
"B", "A", "A", "B"), CountC = c(3L, 2L, 1L, 7L, 5L)), 
class = "data.frame", row.names = c(NA, 
-5L))
  •  Tags:  
  • r
  • Related