Home > Software engineering >  How to index multiple objects / variables within a For loop
How to index multiple objects / variables within a For loop

Time:05-25

I have data that is made up of three columns:

Household ID, Product ID (H14aq2), Value.

I have approximately 7000 rows (Household IDs) which can be grouped into 12 districts and 160 products. HH Ids can appear multiple times since they consume multiple products. My goal is to sum the values across the households for each product such that I get a district wide total of product value. I know how to achieve this manually, but I want to use a loop since I will be doing this for multiple datasets.

Here is my current code. This actually runs without error, showing 156 iterations, however when I look at the total_values_05 object, there is only one extra vector attached, val_i.

for(i in 105:161){
  
  total_val_i <- cons_05 %>% 
    filter(H14aq2 == i) %>% 
    group_by(Districtn05) %>% 
    summarise(val_i = sum(total_val_yr)) %>% 
    ungroup()
  
  total_values_05 <- total_values_05 %>% 
    left_join(total_val_i)
  rm(total_val_i)
  
}

There are 161 products (indexed with the variable H14aq2 from 101 to 161). Before this loop, I create the object total_values_05 where I take care of the products 101 to 104 for other reasons.

In each iteration, I want to filter for the individual product, sum across the total_val_yr variable which contains the values, and then attach the new vector val_i to the existing object total_values_05. Ultimately I want an object structured as the following:

District val_101 val_102 val_103
First row row row
Second row row row

(up to val_161 and district 12)

It seems to me that I'm missing a small thing to actually make this work, since the code runs and actually already attaches a variable called val_i - I presume there is an issue with indexing multiple things with i.

This is my first attempt at a loop! Any help much appreciated :)

Here is example data (containing only the 4 variables required for my question)

structure(list(Hhid = structure(c("1033000301", "1033000301", 
"1033000301", "1033000301", "1033000301", "1033000301"), label = "Unique hh identifier across panel waves", format.stata = "%-10s"), 
    Districtn05 = structure(c("Kiboga", "Kiboga", "Kiboga", "Kiboga", 
    "Kiboga", "Kiboga"), label = "District name as in 2005/06", format.stata = "%-13s"), 
    H14aq2 = structure(c(150, 135, 140, 136, 112, 103), label = "Consumption item code", format.stata = ".0g", labels = c(Matooke = 101, 
    Matooke = 102, Matooke = 103, Matooke = 104, `Sweet potatoes fresh` = 105, 
    `Sweet potatoes dry` = 106, `Cassava fresh` = 107, `Cassava dry/flour` = 108, 
    `Irish potatoes` = 109, Rice = 110, `Maize grains` = 111, 
    `Maize cobs` = 112, `Maize flour` = 113, Bread = 114, Millet = 115, 
    Sorghum = 116, Beef = 117, Pork = 118, `Goat meat` = 119, 
    `Other meat` = 120, Chicken = 121, `Fresh fish` = 122, `Dry/smoked fish` = 123, 
    Eggs = 124, `Fresh milk` = 125, `Infant formula foods` = 126, 
    `Cooking oil` = 127, Ghee = 128, `Margarine,butter` = 129, 
    `Passion fruits` = 130, `Sweet bananas` = 131, Mangoes = 132, 
    Oranges = 133, `Other fruits` = 134, Onions = 135, Tomatoes = 136, 
    Cabbages = 137, Dodo = 138, `Other vegetables` = 139, `Beans fresh` = 140, 
    `Beans dry` = 141, `Ground nuts in shell` = 142, `Ground nuts shelled` = 143, 
    `Ground nuts pounded` = 144, Peas = 145, Simsim = 146, Sugar = 147, 
    Coffee = 148, Tea = 149, Salt = 150, Soda = 151, Beer = 152, 
    `Other alcoholic drinks` = 153, `Other drinks` = 154, Cigarettes = 155, 
    `Other tobbaco` = 156, `Expenditure in restaurants on food` = 157, 
    `Expenditure in restaurants on soda` = 158, `Expenditure in restaurants on beer` = 159, 
    `Other juice` = 160, `Other foods` = 161), class = c("haven_labelled", 
    "vctrs_vctr", "double")), total_val_yr = c(3250, 10400, 156000, 
    10400, 260000, 312000)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame")) ```



CodePudding user response:

You can group by multiple columns and then pivot the summarized results into a wide format like this:

library(tidyverse)

data <- structure(list(
  Hhid = structure(c(
    "1033000301", "1033000301",
    "1033000301", "1033000301", "1033000301", "1033000301"
  ), label = "Unique hh identifier across panel waves", format.stata = "%-10s"),
  Districtn05 = structure(c(
    "Kiboga", "Kiboga", "Kiboga", "Kiboga",
    "Kiboga", "Kiboga"
  ), label = "District name as in 2005/06", format.stata = "%-13s"),
  H14aq2 = structure(c(150, 135, 140, 136, 112, 103), label = "Consumption item code", format.stata = ".0g", labels = c(
    Matooke = 101,
    Matooke = 102, Matooke = 103, Matooke = 104, `Sweet potatoes fresh` = 105,
    `Sweet potatoes dry` = 106, `Cassava fresh` = 107, `Cassava dry/flour` = 108,
    `Irish potatoes` = 109, Rice = 110, `Maize grains` = 111,
    `Maize cobs` = 112, `Maize flour` = 113, Bread = 114, Millet = 115,
    Sorghum = 116, Beef = 117, Pork = 118, `Goat meat` = 119,
    `Other meat` = 120, Chicken = 121, `Fresh fish` = 122, `Dry/smoked fish` = 123,
    Eggs = 124, `Fresh milk` = 125, `Infant formula foods` = 126,
    `Cooking oil` = 127, Ghee = 128, `Margarine,butter` = 129,
    `Passion fruits` = 130, `Sweet bananas` = 131, Mangoes = 132,
    Oranges = 133, `Other fruits` = 134, Onions = 135, Tomatoes = 136,
    Cabbages = 137, Dodo = 138, `Other vegetables` = 139, `Beans fresh` = 140,
    `Beans dry` = 141, `Ground nuts in shell` = 142, `Ground nuts shelled` = 143,
    `Ground nuts pounded` = 144, Peas = 145, Simsim = 146, Sugar = 147,
    Coffee = 148, Tea = 149, Salt = 150, Soda = 151, Beer = 152,
    `Other alcoholic drinks` = 153, `Other drinks` = 154, Cigarettes = 155,
    `Other tobbaco` = 156, `Expenditure in restaurants on food` = 157,
    `Expenditure in restaurants on soda` = 158, `Expenditure in restaurants on beer` = 159,
    `Other juice` = 160, `Other foods` = 161
  ), class = c(
    "haven_labelled",
    "vctrs_vctr", "double"
  )), total_val_yr = c(
    3250, 10400, 156000,
    10400, 260000, 312000
  )
), row.names = c(NA, -6L), class = c(
  "tbl_df",
  "tbl", "data.frame"
))


data %>%
  group_by(Districtn05, H14aq2) %>%
  summarise(total_val_yr = sum(total_val_yr)) %>%
  select(total_val_yr, H14aq2) %>%
  pivot_wider(names_from = H14aq2, values_from = total_val_yr, names_prefix = "val_")
#> `summarise()` has grouped output by 'Districtn05'. You can override using the
#> `.groups` argument.
#> Adding missing grouping variables: `Districtn05`
#> # A tibble: 1 × 7
#> # Groups:   Districtn05 [1]
#>   Districtn05 val_103 val_112 val_135 val_136 val_140 val_150
#>   <chr>         <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 Kiboga       312000  260000   10400   10400  156000    3250

Created on 2022-05-25 by the reprex package (v2.0.0)

  • Related