Function or loop to repeat task (2)-CodePudding

I'm trying to create a loop or a function for the dataset below.

For overall_health, outlook_life, and health_mental- I am trying to calculate the proportion by sex, age_group, income and education. In addition, I would like to create two separate outputs by the geography (area1 and area2).

Please note that I do not want to use the pivot longer function.

Thanks in advance for the help!

Here's the a sample data frame (survey data):

library(tidyverse)
library(dplyr)

df <- data.frame (overall_health = c("poor", "good", "excellent", "poor", "good", "poor", "poor", "excellent"),
                  outlook_life = c("good", "excellent", "excellent", "poor", "excellent", "poor", "excellent", "poor"),
                  health_mental = c("poor", "poor", "excellent", "poor", "poor", "poor", "excellent", "good"),
                  sex = c("F", "M", "M", "F", "F", "M", "F", "M"),
                  age_group = c("50-54", "60-64", "80 ", "70-74", "40-44", "45-49", "60-64", "65-69"),
                  income = c("$<40,000", "$50,000-79,000", "$80,000-110,000", "$111,000 ", "$<40,000", "$<40,000", "$50,000-79,000", "$80,000-110,000"),
                  education = c("HS", "College", "Bachelors", "Masters", "HS", "College", "Bachelors", "Masters"),
                  geography= c("area1", "area2", "area1", "area2", "area2", "area1", "area2", "area1"))

Typically, one would calculate the proportion by age group as such:

df_new <- df %>%
  group_by(overall_health, age_group, geography) %>%
  summarise(count = n(),
            total = 8,
            proportion = count/total *100)

However, I would like to create a loop/function to calculate these portions for overall_health, outlook_life, and health_mental by: age_group, sex, income, education (in geography= "area1") I would like to do the same or geography= "area 2".

So for overall_health, outlook_life, and health_mental I would have multiple outputs by age_group, sex, income, education that look like this:

overall_health x age_group

outlook_life x age_group

health_mental x age_group

overall_health x sex

CodePudding user response：


library(tidyverse)


## unique geographies:
geos <- unique(df$geography)

## You can input combinations to a list to iterate over and label accordingly.
vars <- c("health_mental", "overall_health", "outlook_life")

## expand.grid will create every possible combination between the two vectors you supply it:
combinations <- expand.grid(c("age_group", "sex"), vars, stringsAsFactors = F)

combinations$label <- paste(combinations$Var1, combinations$Var2, sep = "_")

## list to output dataframe results to:
output <- list()

## Outer for loop will iterate through each of your geographies.
for (geo in geos){
  ## Inner loop will iterate through each of your combinations
  for (combo in 1:nrow(combinations)){
    ## outputs are labelled based on combination   geo and stored in output list.
    output_label <- paste(combinations[[combo,"label"]],geo,sep="_")
    temp <- df %>%
                filter(geography == geo) %>%
                group_by_at(combinations[combo,1:2] %>% unlist()%>%unname()) %>%
                summarise(count = n(),
                          total = nrow(.),
                          proportion = count/nrow(.) *100)
    output[[output_label]] <- temp
    
    # You could also export each csv iteratively, with proper labelling automatically as well:
    #write.csv(temp, paste0("insert_file_path", output_label, ".csv"))
  }
}

Dataframes can be accessed through the lists by their list label:

## all dataframes:
output

## Overall Health x Age Group for Area 1:
output$overall_health_x_age_group_area1
## or
output[["overall_health_x_age_group_area1"]]