Home > other >  Finding Correlation and Regression in R
Finding Correlation and Regression in R

Time:02-25

I want to perform the following task using fastfooddataset fromopenintro` package in R.

a) Create a correlation matrix for the relations between calories, total_fat, sugar, and calcium for all items at Sonic, Subway, and Taco Bell, omitting missing values with na.omit().

b) Create a regression predicting whether or not a restaurant is McDonalds or Subway based on calories, sodium, and protein.

c) Run a regression predicting calories from saturated fat, fiber, and sugar. Based on standardized regression coefficients, identify the strongest predictor.

here is my code:

library(tidyverse)

library(openintro)

library(lm.beta)

fastfood <- openintro::fastfood

head(fastfood)

fastfood.corr <- cor(fastfood$calories, fastfood$total_fat, fastfood$sugar, fastfood$calcium,use="pairwise.complete.obs" ,method = "pearson")

but I'm getting errors Error in match.arg(alternative) : 'arg' must be NULL or a character vector:

CodePudding user response:

You probably should apply cor on a subset of your data frame columns like so:

cor(fastfood[c('calories', 'total_fat', 'sugar', 'calcium')], 
    use="pairwise.complete.obs", method="pearson")
#            calories total_fat     sugar   calcium
# calories  1.0000000 0.9004937 0.4377113 0.3512067
# total_fat 0.9004937 1.0000000 0.2593702 0.1688170
# sugar     0.4377113 0.2593702 1.0000000 0.3105594
# calcium   0.3512067 0.1688170 0.3105594 1.0000000

You may also subset for specific rows (e.g. restaurants). (The difference to above is that when we subset without commas, data[j], columns are selected, and when we use a comma data[i, j], i are rows and j are columns. See ?Extract.)

cor(fastfood[fastfood$restaurant %in% c("Sonic", "Subway", "Taco Bell"),
             c('calories', 'total_fat', 'sugar', 'calcium')], 
    use="pairwise.complete.obs", method="pearson")
#            calories total_fat     sugar   calcium
# calories  1.0000000 0.8402781 0.5150627 0.6127083
# total_fat 0.8402781 1.0000000 0.2234985 0.2415309
# sugar     0.5150627 0.2234985 1.0000000 0.6690489
# calcium   0.6127083 0.2415309 0.6690489 1.0000000

You could also use dplyr syntax, but it's more code.

library(dplyr)
fastfood %>%
  filter(restaurant %in% c("Sonic", "Subway", "Taco Bell")) %>%
  select(calories, total_fat, sugar, calcium) %>%
  cor(use="pairwise.complete.obs", method="pearson")
#            calories total_fat     sugar   calcium
# calories  1.0000000 0.8402781 0.5150627 0.6127083
# total_fat 0.8402781 1.0000000 0.2234985 0.2415309
# sugar     0.5150627 0.2234985 1.0000000 0.6690489
# calcium   0.6127083 0.2415309 0.6690489 1.0000000

Data:

fastfood <- openintro::fastfood
  • Related