I am trying to find the correlation between two columns (sunshine_in_hours and AgeGroup_30_to_34) from a combined dataset in R. However, every time I try to run the cor() function, I just end up getting this error:
Error in pmatch(use, c("all.obs", "complete.obs", "pairwise.complete.obs", :
object 'AgeGroup_30_to_34' not found
Here's the dput(head) snipit:
structure(list(Date = structure(c(18659, 18660, 18661, 18663,
18665, 18666, 18667, 18668, 18669, 18670, 18671, 18673, 18674,
18675, 18676, 18677, 18678, 18679, 18680, 18681, 18682, 18683,
18684, 18685, 18686, 18687, 18688, 18689, 18690, 18691), class = "Date"),
Year = c(2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
2021, 2021), Month = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3), AgeGroup_30_to_34 = c(0,
0, 0, 2, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0,
2, 0, 0, 1, 2, 0, 3, 0, 0, 0), Sunshine_in_hours = c(1.6,
3.4, 13.1, 8.9, 2, 1.7, 12.7, 11.6, 5.5, 5.6, 4.9, 9.2, 8.3,
11.9, 12.4, 12.4, 5.9, 0, 6.3, 8.5, 9.9, 8.7, 6.3, 1, 9.2,
6.3, 1.4, 2.1, 2.6, 3.6), City = c("Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne", "Melbourne", "Melbourne",
"Melbourne", "Melbourne", "Melbourne")), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
I tried to run the code:
Combined <- inner_join(covidS, weatherS, by = 'Date')%>%
mutate(Date = mdy(Date),
Year = year(Date),
Month = month(Date),
Day = day(Date))%>%
select(Date, Year, Month, AgeGroup_30_to_34, Sunshine_in_hours, City)%>%
filter(City == 'Melbourne')%>%
cor(Sunshine_in_hours, AgeGroup_30_to_34 )
I've tried looking up tutorials on how to do this, however I keep running into a wall. Any help will be appreciated.
CodePudding user response:
cor
takes two inputs, and you're giving it 3, two of which it doesn't understand. Try this:
Combined <- inner_join(covidS, weatherS, by = 'Date')%>%
mutate(Date = mdy(Date),
Year = year(Date),
Month = month(Date),
Day = day(Date))%>%
select(Date, Year, Month, AgeGroup_30_to_34, Sunshine_in_hours, City)%>%
filter(City == 'Melbourne')
corr = cor(Combined$Sunshine_in_hours, Combined$AgeGroup_30_to_34 )
Remember when you're using pipes, you're feeding your last object as the first argument of the function you're calling. In this case, your code was equivalent to:
cor(inner_join(covidS, weatherS, by = 'Date')%>%
mutate(Date = mdy(Date),
Year = year(Date),
Month = month(Date),
Day = day(Date))%>%
select(Date, Year, Month, AgeGroup_30_to_34, Sunshine_in_hours, City)%>%
filter(City == 'Melbourne'),
Sunshine_in_hours, AgeGroup_30_to_34 )
So both Sunshine_in_hours
and AgeGroup_30_to_34
mean nothing if the function doesn't know those are columns from another dataframe. The thing is, this function was coded for base R, and the rest of your programming is dplyr
, which are different paradigms. Always check the docs when in doubt
CodePudding user response:
Using the magrittr
exposition pipe %$%
instead of %>%
you could do:
library(magrittr)
dat %$%
cor(Sunshine_in_hours, AgeGroup_30_to_34)
#> [1] -0.0006941058