I want to perform the following task using fastfooddataset from
openintro` package in R.
a) Create a correlation matrix for the relations between calories, total_fat, sugar, and calcium for all items at Sonic, Subway, and Taco Bell, omitting missing values with na.omit().
b) Create a regression predicting whether or not a restaurant is McDonalds or Subway based on calories, sodium, and protein.
c) Run a regression predicting calories from saturated fat, fiber, and sugar. Based on standardized regression coefficients, identify the strongest predictor.
here is my code:
library(tidyverse)
library(openintro)
library(lm.beta)
fastfood <- openintro::fastfood
head(fastfood)
fastfood.corr <- cor(fastfood$calories, fastfood$total_fat, fastfood$sugar, fastfood$calcium,use="pairwise.complete.obs" ,method = "pearson")
but I'm getting errors Error in match.arg(alternative) : 'arg' must be NULL or a character vector:
CodePudding user response:
You probably should apply cor
on a subset of your data frame columns like so:
cor(fastfood[c('calories', 'total_fat', 'sugar', 'calcium')],
use="pairwise.complete.obs", method="pearson")
# calories total_fat sugar calcium
# calories 1.0000000 0.9004937 0.4377113 0.3512067
# total_fat 0.9004937 1.0000000 0.2593702 0.1688170
# sugar 0.4377113 0.2593702 1.0000000 0.3105594
# calcium 0.3512067 0.1688170 0.3105594 1.0000000
You may also subset for specific rows (e.g. restaurants). (The difference to above is that when we subset without commas, data[j]
, columns are selected, and when we use a comma data[i, j]
, i
are rows and j
are columns. See ?Extract
.)
cor(fastfood[fastfood$restaurant %in% c("Sonic", "Subway", "Taco Bell"),
c('calories', 'total_fat', 'sugar', 'calcium')],
use="pairwise.complete.obs", method="pearson")
# calories total_fat sugar calcium
# calories 1.0000000 0.8402781 0.5150627 0.6127083
# total_fat 0.8402781 1.0000000 0.2234985 0.2415309
# sugar 0.5150627 0.2234985 1.0000000 0.6690489
# calcium 0.6127083 0.2415309 0.6690489 1.0000000
You could also use dplyr
syntax, but it's more code.
library(dplyr)
fastfood %>%
filter(restaurant %in% c("Sonic", "Subway", "Taco Bell")) %>%
select(calories, total_fat, sugar, calcium) %>%
cor(use="pairwise.complete.obs", method="pearson")
# calories total_fat sugar calcium
# calories 1.0000000 0.8402781 0.5150627 0.6127083
# total_fat 0.8402781 1.0000000 0.2234985 0.2415309
# sugar 0.5150627 0.2234985 1.0000000 0.6690489
# calcium 0.6127083 0.2415309 0.6690489 1.0000000
Data:
fastfood <- openintro::fastfood