so I'm a total newbie with R, and in one of our final assignments, we want to do quantiles per country and per column on this data.
We have tried to do it with the apply function, with a loop, but we have not been able to crack it yet:
Ano Paises Males.total
1 2011 Belgium 19.5
2 2011 Bulgaria 46.4
3 2011 Czechia 11.9
4 2011 Denmark 17.5
5 2011 Germany (until 1990 former territory of the FRG) 18.5
6 2011 Estonia 22.9
Females.total Malessinterminar Females.sin.terminar malespostsecundaria
1 21 33 34.3 16.7
2 49.7 72.1 75.1 42.1
3 16.4 32.3 28.6 11.2
4 17.9 24.6 24.4 16.8
5 21.3 38.5 34.7 21.5
6 22.5 34 35.4 24.3
Femalespostsecundaria Males.universidad Femalesuniversidad
1 19 10.6 10.1
2 45.4 17.1 24.9
3 15.7 4.1 5.4
4 17.8 11.9 12.1
5 21.5 10.3 13.4
6 27 10.5 10.7
We have tried this loop, that we would like to do with each column of data by country. The thing is that this operation gives us more that one result, so the loop doesn't compute it:
estadosunicos<-unique(paises)
resultados<-matrix(0,length(estadosunicos),ncol = 3)
for (i in 1:length(estadosunicos)){
selec<-estadosunicos[i]
resultados[i,1]<-males.sin.terminar[paises==estadosunicos][females.sin.terminar<quantile(females.sin.terminar, 0.25)]
resultados[i,2]<-males.sin.terminar[paises==estadosunicos][males.sin.terminar>quantile(males.sin.terminar,0.25)& males.sin.terminar<quantile(males.sin.terminar,0.75)]
resultados[i,3]<-males.sin.terminar[paises==estadosunicos][males.sin.terminar>quantile(males.sin.terminar,0.75)]
}
rownames(resultados)<-estadosunicos
So we don't know how to do this. we would like to get the 25%, 50% and 75% of these data by country, but we have more than 300 rows of information so the countries are repeated several times through the different years. How can we do it? Thank you so much for your help!
CodePudding user response:
We can do a group by operation and then get the quantile
on each of those numeric columns by looping across
the columns and then return a list
object which can be converted to columns with unnest_wider
etc.
library(dplyr)
df1 %>%
select(-Ano) %>%
group_by(paises) %>%
summarise(across(where(is.numeric), ~
list(as.list(quantile(.x, prob = c(.25, 0.5, 0.75)))))