This is my first post here and I'm newer to R so I apologize if this post is worded weird.
I am working on an analysis of a large dataset for a single year. I want to categorize continuous BMI data into the categories ranging from "underweight" to "obese". To categorize across multiple years of this dataset I want to write a function that would be able to be used over multiple years where the datasets are named slightly different.
Is there a way I can write this function so I can apply it to different years of the dataset without rewriting my code??
bmi_categories<- function(df_bmi_cat){(as.factor(ifelse(df$BMI2< 18.5 &df$AGE2>6, "Underweight",(ifelse (18.5<=df$BMI2 & df$BMI2<25 & df$AGE2>6, "Normal Weight",(ifelse (25<=df$BMI2 & df$BMI2<30 & df$AGE2>6, "Overweight",(ifelse (30<=df$BMI2 & df$AGE2>6, "Obese","")))))))))}
The first 6 observations of the dataframe look like this:
AGE2 BMI2
1 15 22.50087
2 17 24.88647
3 14 22.70773
4 9 23.49076
5 7 22.14871
6 16 23.10811
Thanks in advance to anyone who responds!
CodePudding user response:
This should do what you want but with a lot less nested functions.
library(dplyr)
df %>% mutate(Classification = case_when(AGE2 <= 6 ~ "",
BMI2 < 18.5 ~ "Underwwight",
BMI2 < 25 ~ "Normal weight",
BMI2 < 30 ~ "Overwwight",
BMI2 >= 30 ~ "Obese"
))
This will create an additional column for the weight classification.
A tibble:6 x 3
AGE2 BMI2 Classification
<dbl> <dbl> <chr>
15 22.50087 Normal weight
17 24.88647 Normal weight
14 22.70773 Normal weight
9 23.49076 Normal weight
5 22.14871
16 23.10811 Normal weight
6 rows
This is also very easy to apply as a function if required.
CodePudding user response:
Since the names of the columns are different each time, I would provide the function not with the entire dataframe, but with the specific data columns.
example data
df1 <- data.frame(AGE1 = c(6, 12, 24, 56, 32), BMI1 = c(20, 18, 27, 31, 29))
> df1
AGE1 BMI1
1 6 20
2 12 18
3 24 27
4 56 31
5 32 29
function
bmi_categories <- function(bmi, age) {
category = factor(rep(NA,length(bmi)), levels = c("Underweight","Normal Weight","Overweight","Obese")) # NA as default value, you could set "" as default, but then you should also add "" to the vector of levels
category[bmi<18.5 & age>6] <- "Underweight"
category[18.5<=bmi & bmi<25 & age>6] <- "Normal Weight"
category[25<=bmi & bmi<30 & age>6] <- "Overweight"
category[30<=bmi & age>6] <- "Obese"
return(category)
}
(You could also use the code by JaredS and turn that into a function. I personally try to avoid using external libraries where possible, so the code is easier to run on another computer.)
call the function and assign return value to new column
df1$class <- bmi_categories(df1$BMI1, df1$AGE1)
> df1
AGE1 BMI1 class
1 6 20 <NA>
2 12 18 Underweight
3 24 27 Overweight
4 56 31 Obese
5 32 29 Overweight