I am trying to calculate some macronutrients values obtained from a reference table. These are my inputs:
DF1. The values indicate the amount of portions consumed per day, on a monthly basis.
id Cow Milk Soy Milk Yoghurt (...)
001 0.07 0 0 ...
002 0 0.4 0 ...
003 0.07 0.07 0.13 ...
004 2.5 0 0 ...
... ... ... ... ...
My reference table looks like this:
DF2. Reference values
Food Kcal Proteins Trans Fat Satured Fat (5 more columns)
Cow Milk 91.50 4.95 4.95 3.12 ...
Soy Milk 49.50 4.20 2.85 1.80 ...
Yoghurt 122.00 7.00 6.60 0.18 ...
...... ... ... ... ... ...
What I need to do is:
- Multiply portions value of the food times the corresponding value of that food in the reference table for each variable (i.e., kcal, protein, fat...).
- Sum all the values obtained for each food in the same variable (sum all kcal, sum all the protein...) for that id.
- Consolidate in one data.frame.
So, for example, the kcal and protein values only for id 001 so far should be:
id001
kcal
(0.07*91.5) (0*49.5) (0*122) = 6.405
protein
(0.07*4.95) (0*4.2) (0*7) = 0.198
...
And I need to calculate that for all the foods, all the other variables of reference table for that same id and for dozens of other ids.
My final table should look like this:
id | Total Kcal | Total Proteins | ... |
---|---|---|---|
001 | 6.405 | 0.198 | ... |
... | ... | ... | ... |
I was thinking of implementing a loop:
results <- data.frame()
for (i in 1:ncol(df1)) {
kcal <- df1[,i] * df2[i,]
results$kcal <- rbind(results$kcal, kcal)
}
But I don't even know how to make it iterate through each variable while maintaining df1[,i]
position, nor make it sum the values once has finalized... never have done such a complex thing before. Any help is appreciated.
CodePudding user response:
Here is a tidyverse
option
library(tidyverse)
DF1 %>%
pivot_longer(-id, names_to = "Food", values_to = "portion") %>%
left_join(DF2 %>% pivot_longer(-Food), by = "Food") %>%
group_by(id, name) %>%
summarise(value = sum(value * portion), .groups = "drop") %>%
pivot_wider(names_prefix = "Total ")
## A tibble: 4 × 5
# id `Total Kcal` `Total Proteins` `Total Satured Fat` `Total Trans Fat`
# <int> <dbl> <dbl> <dbl> <dbl>
#1 1 6.40 0.347 0.218 0.347
#2 2 19.8 1.68 0.72 1.14
#3 3 25.7 1.55 0.368 1.40
#4 4 229. 12.4 7.8 12.4
Please note that there is an error in your example calculation for Total Proteins
for id001
:
(0.07 * 4.95) (0 * 4.2) (0 * 7) =
0.1980.3465
Explanation: We reshape both DF1
and DF2
from wide to long, then do a left-join of long DF1
with long DF2
by "Food"
. We can then group_by(id, name)
(where name
gives the name of the quantity from DF2
: Kcal, Proteins, Trans Fat, etc.) and calculate the desired quantities as the sum(value * portion)
, respectively. Finally, we reshape again from long to wide, and add the prefix "Total "
to the new wide columns.
Sample data
DF1 <- read.table(text = "id 'Cow Milk' 'Soy Milk' Yoghurt
001 0.07 0 0
002 0 0.4 0
003 0.07 0.07 0.13
004 2.5 0 0", header = T, check.names = F)
DF2 <- read.table(text = "Food Kcal Proteins 'Trans Fat' 'Satured Fat'
'Cow Milk' 91.50 4.95 4.95 3.12
'Soy Milk' 49.50 4.20 2.85 1.80
Yoghurt 122.00 7.00 6.60 0.18", header = T, check.names = F)
CodePudding user response:
Here is a way to achieve this using for
loop:
results = data.frame()
for (i in 1:nrow(DF1)) {
df_composition_for_id_i = DF2 %>% filter(Food %in% names(DF1[i,])[DF1[i,]>0])
quantity_food = t(DF1[i,-1])[t(DF1[i,-1])>0]
df_transform = sweep(df_composition_for_id_i[,-1], 1, quantity_food, `*`)
Total = c(i, colSums(df_transform))
names(Total)[1]= "id"
results = rbind(results, Total)
}
names(results) = names(DF2)
names(results)[1] = "id"
> results
id Kcal Proteins Trans Fat Satured Fat
1 1 6.405 0.3465 0.3465 0.2184
2 2 19.800 1.6800 1.1400 0.7200
3 3 25.730 1.5505 1.4040 0.3678
4 4 228.750 12.3750 12.3750 7.8000
Using this for loop you can feed a dataframe with more columns in your DF2 (eg. carbohydrates
, vitamins
, ...) which will be computed in the loop without more intervention.
Explanation:
In the for
loop the first df_composition_for_id_i
is a dataframe with only the nutrient present in the current iteration for example when i=3
:
i=3
df_composition_for_id_i = DF2 %>% filter(Food %in% names(DF1[i,])[DF1[i,]>0])
df_composition_for_id_i
Food Kcal Proteins Trans Fat Satured Fat
1 Cow Milk 91.5 4.95 4.95 3.12
2 Soy Milk 49.5 4.20 2.85 1.80
3 Yoghurt 122.0 7.00 6.60 0.18
quantity_food
is the quantity of each nutrient that will be pass to multiply by row
quantity_food
[1] 0.07 0.07 0.13
df_transform
take the first element created in this loop (df_composition_for_id_i
) and then multiply by row with the second (quantity_food
) (excluding the Food
name) using the sweep
function:
df_transform = sweep(df_composition_for_id_i[,-1], 1, quantity_food, `*`)
df_transform
Kcal Proteins Trans Fat Satured Fat
1 6.405 0.3465 0.3465 0.2184
2 3.465 0.2940 0.1995 0.1260
3 15.860 0.9100 0.8580 0.0234
Lastly, the sum of this is calculated and id added with some tidy up for the naming and binded by row on the new dataframe:
Total = c(i, colSums(df_transform))
names(Total)[1]= "id"
results = rbind(results, Total)
id Kcal Proteins Trans Fat Satured Fat
1 3 25.73 1.5505 1.404 0.3678