Home > database >  Looping over multiple values and storing results in a data frame
Looping over multiple values and storing results in a data frame

Time:08-29

I am trying to calculate some macronutrients values obtained from a reference table. These are my inputs:

DF1. The values indicate the amount of portions consumed per day, on a monthly basis.

id  Cow Milk    Soy Milk    Yoghurt (...)
001 0.07        0           0        ...
002 0           0.4         0        ...
003 0.07        0.07        0.13     ...
004 2.5         0           0        ...
... ...         ...         ...      ...

My reference table looks like this:

DF2. Reference values

Food      Kcal   Proteins Trans Fat Satured Fat (5 more columns)
Cow Milk  91.50  4.95     4.95      3.12        ...
Soy Milk  49.50  4.20     2.85      1.80        ...
Yoghurt   122.00 7.00     6.60      0.18        ...
......    ...    ...      ...       ...         ...

What I need to do is:

  1. Multiply portions value of the food times the corresponding value of that food in the reference table for each variable (i.e., kcal, protein, fat...).
  2. Sum all the values obtained for each food in the same variable (sum all kcal, sum all the protein...) for that id.
  3. Consolidate in one data.frame.

So, for example, the kcal and protein values only for id 001 so far should be:

id001
  kcal
   (0.07*91.5)   (0*49.5)   (0*122) = 6.405
  protein
   (0.07*4.95)   (0*4.2)    (0*7)   = 0.198
  ...

And I need to calculate that for all the foods, all the other variables of reference table for that same id and for dozens of other ids.

My final table should look like this:

id Total Kcal Total Proteins ...
001 6.405 0.198 ...
... ... ... ...

I was thinking of implementing a loop:

results <- data.frame()

for (i in 1:ncol(df1)) {
  kcal <- df1[,i] * df2[i,]
  results$kcal <- rbind(results$kcal, kcal)
}

But I don't even know how to make it iterate through each variable while maintaining df1[,i] position, nor make it sum the values once has finalized... never have done such a complex thing before. Any help is appreciated.

CodePudding user response:

Here is a tidyverse option

library(tidyverse)
DF1 %>% 
    pivot_longer(-id, names_to = "Food", values_to = "portion") %>%
    left_join(DF2 %>% pivot_longer(-Food), by = "Food") %>%
    group_by(id, name) %>%
    summarise(value = sum(value * portion), .groups = "drop") %>%
    pivot_wider(names_prefix = "Total ")
## A tibble: 4 × 5
#     id `Total Kcal` `Total Proteins` `Total Satured Fat` `Total Trans Fat`
#  <int>        <dbl>            <dbl>               <dbl>             <dbl>
#1     1         6.40            0.347               0.218             0.347
#2     2        19.8             1.68                0.72              1.14 
#3     3        25.7             1.55                0.368             1.40 
#4     4       229.             12.4                 7.8              12.4  

Please note that there is an error in your example calculation for Total Proteins for id001:

(0.07 * 4.95) (0 * 4.2) (0 * 7) = 0.198 0.3465

Explanation: We reshape both DF1 and DF2 from wide to long, then do a left-join of long DF1 with long DF2 by "Food". We can then group_by(id, name) (where name gives the name of the quantity from DF2: Kcal, Proteins, Trans Fat, etc.) and calculate the desired quantities as the sum(value * portion), respectively. Finally, we reshape again from long to wide, and add the prefix "Total " to the new wide columns.


Sample data

DF1 <- read.table(text = "id  'Cow Milk'    'Soy Milk'    Yoghurt
001 0.07        0           0
002 0           0.4         0
003 0.07        0.07        0.13
004 2.5         0           0", header = T, check.names = F)

DF2 <- read.table(text = "Food      Kcal   Proteins 'Trans Fat' 'Satured Fat'
'Cow Milk'  91.50  4.95     4.95      3.12
'Soy Milk'  49.50  4.20     2.85      1.80
Yoghurt   122.00 7.00     6.60      0.18", header = T, check.names = F)

CodePudding user response:

Here is a way to achieve this using for loop:

results = data.frame() 
for (i in 1:nrow(DF1)) {
  df_composition_for_id_i = DF2 %>% filter(Food %in% names(DF1[i,])[DF1[i,]>0])
  quantity_food = t(DF1[i,-1])[t(DF1[i,-1])>0]
  df_transform = sweep(df_composition_for_id_i[,-1], 1, quantity_food, `*`)
  Total = c(i, colSums(df_transform))
  names(Total)[1]= "id"
  results = rbind(results, Total)
}
names(results) = names(DF2)
names(results)[1] = "id"

> results
  id    Kcal Proteins Trans Fat Satured Fat
1  1   6.405   0.3465    0.3465      0.2184
2  2  19.800   1.6800    1.1400      0.7200
3  3  25.730   1.5505    1.4040      0.3678
4  4 228.750  12.3750   12.3750      7.8000

Using this for loop you can feed a dataframe with more columns in your DF2 (eg. carbohydrates, vitamins, ...) which will be computed in the loop without more intervention.

Explanation:

In the for loop the first df_composition_for_id_i is a dataframe with only the nutrient present in the current iteration for example when i=3:

i=3
df_composition_for_id_i = DF2 %>% filter(Food %in% names(DF1[i,])[DF1[i,]>0])
df_composition_for_id_i
      Food  Kcal Proteins Trans Fat Satured Fat
1 Cow Milk  91.5     4.95      4.95        3.12
2 Soy Milk  49.5     4.20      2.85        1.80
3  Yoghurt 122.0     7.00      6.60        0.18

quantity_food is the quantity of each nutrient that will be pass to multiply by row

quantity_food
[1] 0.07 0.07 0.13

df_transform take the first element created in this loop (df_composition_for_id_i) and then multiply by row with the second (quantity_food) (excluding the Food name) using the sweep function:

df_transform = sweep(df_composition_for_id_i[,-1], 1, quantity_food, `*`)
df_transform
    Kcal Proteins Trans Fat Satured Fat
1  6.405   0.3465    0.3465      0.2184
2  3.465   0.2940    0.1995      0.1260
3 15.860   0.9100    0.8580      0.0234

Lastly, the sum of this is calculated and id added with some tidy up for the naming and binded by row on the new dataframe:

Total = c(i, colSums(df_transform))
names(Total)[1]= "id"
results = rbind(results, Total)

  id  Kcal Proteins Trans Fat Satured Fat
1  3 25.73   1.5505     1.404      0.3678
  • Related