Thank you in advance for any assistance.
Aim: I have a 5-day food intake survey dataset that I am trying to analyse in R. I am interested in calculating the mean, se, min and max intake for the weight of a specific food consumed per day. I would more easily complete this in excel, but due to the scale of data, I require R to complete this.
Example question: What is a person's daily intake (g) of lettuce? [mean, standard deviation, standard error, min, and max]
Example extraction dataset: please note the actual dataset includes a number of foods and a large no. of participants.
participant | day | code | foodname | weight |
---|---|---|---|---|
132 | 1 | 62 | lettuce | 53 |
84 | 3 | 62 | lettuce | 23 |
132 | 3 | 62 | lettuce | 32 |
153 | 4 | 62 | lettuce | 26 |
142 | 2 | 62 | lettuce | 23 |
123 | 3 | 62 | lettuce | 23 |
131 | 3 | 62 | lettuce | 30 |
153 | 5 | 62 | lettuce | 16 |
At present:
# import dataset
foodsurvey<-read.spss("foodsurvey.sav",to.data.frame=T,use.value.labels=T)
summary(foodsurvey)
# keep my relevant columns
myvariables = subset(food survey, select = c(1,2,3,4,5) )
# rename columns
colnames(myvariables)<-c('participant','day','code','foodname','foodweight')
# create values
day<-myvariables$day
participant<-myvariables$participant
foodcode<-myvariables$foodcode
foodname<-myvariables$foodname
foodweight<-myvariables$foodweight
# extract lettuce by ID code to be analysed
lettuce<- filter(myvariables, foodcode == "62")
dim(lettuce)
str(lettuce)
# errors arise attempting to analyse consumption (weight) of lettuce per day using ops.factor function
# to analyse the outputs
summary(lettuce/days)
quantile(lettuce/foodweight)
max(lettuce)
min(lettuce)
median(lettuce)
mean(lettuce)
CodePudding user response:
this should give you the mean, standard deviation, standard error, min, and max food weight for each participant and food type combinantion along these days:
library(dplyr)
myvariables %>%
filter(foodname == "lettuce") %>%
group_by(participant) %>%
summarise(mean = mean(foodweight, na.rm = T),
max_val = max(foodweight),
min_val = min(foodweight),
sd = sd(foodweight, na.rm = T),
se = sqrt(var(foodweight, na.rm = T)/length(foodweight))
CodePudding user response:
Here's a method that groups by participant
and food itself to give summaries across everything.
dplyr
library(dplyr)
dat %>%
group_by(participant, foodname) %>%
summarize(
across(weight, list(min = min, mean = mean, max = max,
sigma = sd, se = ~ sd(.)/n()))
) %>%
ungroup()
# # A tibble: 6 x 7
# participant foodname weight_min weight_mean weight_max weight_sigma weight_se
# <int> <chr> <int> <dbl> <int> <dbl> <dbl>
# 1 84 lettuce 23 23 23 NA NA
# 2 123 lettuce 23 23 23 NA NA
# 3 131 lettuce 30 30 30 NA NA
# 4 132 lettuce 32 42.5 53 14.8 7.42
# 5 142 lettuce 23 23 23 NA NA
# 6 153 lettuce 16 21 26 7.07 3.54
Once you have those summaries, you can easily filter for one participant, a specific food, etc. If you need to also group by code
, just add it to the group_by
.
The premise of using summarise(across(...))
is that the first argument includes whichever variables you want to summarize (just weight
here, but you can add others if it makes sense), and the second argument is a list of functions in various forms. It accepts just a function symbol (e.g., mean
), a tilde-function facilitate by rlang
(e.g., ~ sd(.) / n()
, where n()
is a dplyr-special function), or regular anonymous functions (e.g., function(z) sd(z)/length(z)
, not shown here). The "name" on the LHS of each listed function is used in the resulting column name.