Looking for advice to analyse this particular objective and data in R-CodePudding

Thank you in advance for any assistance.

Aim: I have a 5-day food intake survey dataset that I am trying to analyse in R. I am interested in calculating the mean, se, min and max intake for the weight of a specific food consumed per day. I would more easily complete this in excel, but due to the scale of data, I require R to complete this.

Example question: What is a person's daily intake (g) of lettuce? [mean, standard deviation, standard error, min, and max]

Example extraction dataset: please note the actual dataset includes a number of foods and a large no. of participants.

participant	day	code	foodname	weight
132	1	62	lettuce	53
84	3	62	lettuce	23
132	3	62	lettuce	32
153	4	62	lettuce	26
142	2	62	lettuce	23
123	3	62	lettuce	23
131	3	62	lettuce	30
153	5	62	lettuce	16

At present:

# import dataset
foodsurvey<-read.spss("foodsurvey.sav",to.data.frame=T,use.value.labels=T)
summary(foodsurvey)

# keep my relevant columns
myvariables = subset(food survey, select = c(1,2,3,4,5) )

# rename columns
colnames(myvariables)<-c('participant','day','code','foodname','foodweight')

# create values
day<-myvariables$day
participant<-myvariables$participant
foodcode<-myvariables$foodcode
foodname<-myvariables$foodname
foodweight<-myvariables$foodweight

# extract lettuce by ID code to be analysed
lettuce<- filter(myvariables, foodcode == "62")
dim(lettuce)
str(lettuce)

# errors arise attempting to analyse consumption (weight) of lettuce per day using ops.factor function

# to analyse the outputs
summary(lettuce/days)
quantile(lettuce/foodweight)
max(lettuce)
min(lettuce)
median(lettuce)
mean(lettuce)

CodePudding user response：

this should give you the mean, standard deviation, standard error, min, and max food weight for each participant and food type combinantion along these days:

library(dplyr)

myvariables %>%
        filter(foodname == "lettuce") %>%
        group_by(participant) %>%
        summarise(mean = mean(foodweight, na.rm = T),
                  max_val = max(foodweight),
                  min_val = min(foodweight),
                  sd = sd(foodweight, na.rm = T),
                  se = sqrt(var(foodweight, na.rm = T)/length(foodweight))

CodePudding user response：

Here's a method that groups by participant and food itself to give summaries across everything.

dplyr

library(dplyr)
dat %>%
  group_by(participant, foodname) %>%
  summarize(
    across(weight, list(min = min, mean = mean, max = max, 
                        sigma = sd, se = ~ sd(.)/n()))
  ) %>%
  ungroup()
# # A tibble: 6 x 7
#   participant foodname weight_min weight_mean weight_max weight_sigma weight_se
#         <int> <chr>         <int>       <dbl>      <int>        <dbl>     <dbl>
# 1          84 lettuce          23        23           23        NA        NA   
# 2         123 lettuce          23        23           23        NA        NA   
# 3         131 lettuce          30        30           30        NA        NA   
# 4         132 lettuce          32        42.5         53        14.8       7.42
# 5         142 lettuce          23        23           23        NA        NA   
# 6         153 lettuce          16        21           26         7.07      3.54

Once you have those summaries, you can easily filter for one participant, a specific food, etc. If you need to also group by code, just add it to the group_by.

The premise of using summarise(across(...)) is that the first argument includes whichever variables you want to summarize (just weight here, but you can add others if it makes sense), and the second argument is a list of functions in various forms. It accepts just a function symbol (e.g., mean), a tilde-function facilitate by rlang (e.g., ~ sd(.) / n(), where n() is a dplyr-special function), or regular anonymous functions (e.g., function(z) sd(z)/length(z), not shown here). The "name" on the LHS of each listed function is used in the resulting column name.