Create function to categorize BMI in multiple dataframes in R-CodePudding

This is my first post here and I'm newer to R so I apologize if this post is worded weird.

I am working on an analysis of a large dataset for a single year. I want to categorize continuous BMI data into the categories ranging from "underweight" to "obese". To categorize across multiple years of this dataset I want to write a function that would be able to be used over multiple years where the datasets are named slightly different.

Is there a way I can write this function so I can apply it to different years of the dataset without rewriting my code??

bmi_categories<- function(df_bmi_cat){(as.factor(ifelse(df$BMI2< 18.5 &df$AGE2>6, "Underweight",(ifelse (18.5<=df$BMI2 & df$BMI2<25 & df$AGE2>6, "Normal Weight",(ifelse (25<=df$BMI2 & df$BMI2<30 & df$AGE2>6, "Overweight",(ifelse (30<=df$BMI2 & df$AGE2>6, "Obese","")))))))))}

The first 6 observations of the dataframe look like this:

    AGE2    BMI2
1   15  22.50087
2   17  24.88647
3   14  22.70773
4   9   23.49076
5   7   22.14871
6   16  23.10811

Thanks in advance to anyone who responds!

CodePudding user response：

This should do what you want but with a lot less nested functions.

library(dplyr)
df %>% mutate(Classification = case_when(AGE2 <= 6 ~ "",
                                         BMI2 < 18.5 ~ "Underwwight",
                                         BMI2 < 25 ~ "Normal weight",
                                         BMI2 < 30 ~ "Overwwight",
                                         BMI2 >= 30 ~ "Obese"
                                         ))

This will create an additional column for the weight classification.

A tibble:6 x 3
AGE2   BMI2   Classification
<dbl>  <dbl>  <chr>

15  22.50087    Normal weight       
17  24.88647    Normal weight       
14  22.70773    Normal weight       
9   23.49076    Normal weight       
5   22.14871            
16  23.10811    Normal weight       
6 rows

This is also very easy to apply as a function if required.

CodePudding user response：

Since the names of the columns are different each time, I would provide the function not with the entire dataframe, but with the specific data columns.

example data

df1 <- data.frame(AGE1 = c(6, 12, 24, 56, 32), BMI1 = c(20, 18, 27, 31, 29))

> df1
  AGE1 BMI1
1    6   20
2   12   18
3   24   27
4   56   31
5   32   29

function

bmi_categories <- function(bmi, age) {
  category = factor(rep(NA,length(bmi)), levels = c("Underweight","Normal Weight","Overweight","Obese")) # NA as default value, you could set "" as default, but then you should also add "" to the vector of levels
  
  category[bmi<18.5 & age>6] <- "Underweight"
  category[18.5<=bmi & bmi<25 & age>6] <- "Normal Weight"
  category[25<=bmi & bmi<30 & age>6] <- "Overweight"
  category[30<=bmi & age>6] <- "Obese"

  return(category)
}

(You could also use the code by JaredS and turn that into a function. I personally try to avoid using external libraries where possible, so the code is easier to run on another computer.)

call the function and assign return value to new column

df1$class <- bmi_categories(df1$BMI1, df1$AGE1)

> df1
  AGE1 BMI1       class
1    6   20        <NA>
2   12   18 Underweight
3   24   27  Overweight
4   56   31       Obese
5   32   29  Overweight