I am trying to sort people in my dataset into three body-size categories (Small, Medium, Large). Consider the toy example below:
library(ggplot2)
library(dplyr)
# Toy dataset
Data<-data.frame(Age=c(40,40,40,41,41,41,42,42,42),
Height=c(180,179,178,177,176,175,174,173,172),
Weight=c(84,83,82,81,80,79,78,77,76))
# Classify people as Small, Medium, or Large
Data$Size<-Data$Height Data$Weight
Data$Sizerank<-rank(Data$Size)
Data$Sizegroup<-as.numeric(cut_number(Data$Sizerank,3))
Data$CohortL<-ifelse(Data$Sizegroup==3,"Large",NA)
Data$CohortM<-ifelse(Data$Sizegroup==2,"Medium",NA)
Data$CohortS<-ifelse(Data$Sizegroup==1,"Small",NA)
temp1<-as.vector(Data$CohortL)
temp2<-as.vector(Data$CohortM)
temp3<-as.vector(Data$CohortS)
temp4<-data.frame(temp1,temp2,temp3)
temp5<-temp4%>%mutate(Cohort=coalesce(temp1,temp2,temp3))
Data$Cohort<-temp5$Cohort
Data<-data.frame(Data$Age,
Data$Height,
Data$Weight,
Data$Cohort)
colnames(Data)<-c("Age","Height","Weight","Cohort")
# Remove temporary files from workspace
rm(temp1,
temp2,
temp3,
temp4,
temp5)
# Print Data
Data
This code quantifies whether people are "Small" (bottom 1/3rd), "Medium" (middle 1/3rd), or "Large" (top 1/3rd), as compared to the whole dataset.
I would like to expand this code to perform the size ranking/grouping separately for each age group. So for example, ranking all 40-year-olds as to whether they are Small, Medium, or Large compared to other 40-year-olds, not the population at large. Ranking separately for each age group would clearly change the Cohort membership, in this case from Large/Large/Large/Medium/Medium/Medium/Small/Small/Small to Large/Medium/Small/Large/Medium/Small/Large/Medium/Small
If I only had three age groups then I would just run this analysis manually, but I have a much wider age range than this in practice, so I think that I need some sort of looping function, maybe a for loop or one of the apply() functions?
Any help or insights would be greatly appreciated. Thank you very much.
P.S. I am also aware that my method of contructing the "Cohort" column is cumbersome, so if anybody knows of a more elegant approach to this then I would be very happy to learn about it.
CodePudding user response:
How about this:
library(dplyr)
Data %>%
group_by(Age) %>%
mutate(size = gtools::quantcut(I(Height Weight),
q=3,
labels=c("Small", "Medium", "Large")))
#> # A tibble: 9 × 5
#> # Groups: Age [3]
#> Age Height Weight Cohort size
#> <dbl> <dbl> <dbl> <chr> <fct>
#> 1 40 180 84 Large Large
#> 2 40 179 83 Large Medium
#> 3 40 178 82 Large Small
#> 4 41 177 81 Medium Large
#> 5 41 176 80 Medium Medium
#> 6 41 175 79 Medium Small
#> 7 42 174 78 Small Large
#> 8 42 173 77 Small Medium
#> 9 42 172 76 Small Small
CodePudding user response:
I dont know if i understand exactly... But, try this.
Obs: You need to install Hmisc package.
Data2 <- Data %>%
mutate(Size = Age Height Weight) %>%
group_by(Age) %>%
mutate(Cohort_groups = as.numeric(Hmisc::cut2(Size, g=3))) %>%
mutate(Cohort = case_when(
Cohort_groups == 3 ~ "Large",
Cohort_groups == 2 ~ "Medium",
Cohort_groups == 1 ~ "Small")) %>%
select(-Cohort_groups)
Data2