I have a dataframe in R with the following format :
ClientID Group CountC
X1 A 3
R3 B 2
D4 A 1
T5 A 7
H0 B 5
I want to calculate the quartiles of CountC
and store the values in another dataframe, I did the following code:
calculate_quantile <- function(data, proba) {
z <- quantile(data, proba)
df <- data.frame(Proba = proba, Value = z)
return(df)
}
proba = c(0.25, 0.50, 0.75)
quartileDF <- calculate_quantile(df$CountC, proba)
Result in quartileDF
is in the following format:
Proba Value
25% 0.25 1
50% 0.50 2
75% 0.75 7
What I need to do now is calculate the same thing but by group.
I tried the following function:
q = c(.25, .5, .75)
quartileDFbyGroup <-
df%>%
group_by(Group) %>%
summarize(quant25 = quantile(df$CountC, probs = q[1]),
quant50 = quantile(df$CountC, probs = q[2]),
quant75 = quantile(df$CountC, probs = q[3]))
But it gives me the same values for both groups:
Group quant25 quant50 quant75
A 1 2 7
B 1 2 7
I tried to do it manually and in the tedious way, to be sure:
df_A = df[df['Group']=='A',]
df_B = df[df['Group']=='B',]
proba = c(0.25, 0.50, 0.75)
quartileDF_A <- calculate_quantile(df_A$CountC, proba)
quartileDF_B <- calculate_quantile(df_B$CountC, proba)
The values I get in each Group dataframe are not the same.
Not sure what I am doing wrong.
Can you help me calculate the quantile dynamically of 2 groups in the same dataframe?
CodePudding user response:
You can mention percentiles in the functions itself. Will this work:
df %>% group_by(Group) %>% summarise(Quantile25 = quantile(CountC, probs = .25),
Quantile50 = quantile(CountC, probs = .5),
Quantile75 = quantile(CountC, probs = .75))
# A tibble: 2 x 4
Group Quantile25 Quantile50 Quantile75
<chr> <dbl> <dbl> <dbl>
1 A 2 3 5
2 B 2.75 3.5 4.25
CodePudding user response:
We may do this with unnest
library(dplyr)
library(tidyr)
df %>%
group_by(Group) %>%
summarise(out = list(as_tibble(as.list(quantile(CountC, probs = c(0.25,
0.5, 0.75)))))) %>%
unnest(out)
-output
# A tibble: 2 × 4
Group `25%` `50%` `75%`
<chr> <dbl> <dbl> <dbl>
1 A 2 3 5
2 B 2.75 3.5 4.25
data
df <- structure(list(ClientID = c("X1", "R3", "D4", "T5", "H0"), Group = c("A",
"B", "A", "A", "B"), CountC = c(3L, 2L, 1L, 7L, 5L)),
class = "data.frame", row.names = c(NA,
-5L))