This is the correlation matrix I get when I am running this code in r
Correl2 <- Data %>% select(Price, Price2, Price.m2, Listingprice2, Listingdays, Type, Age, BRA, Balcony, Bedrooms,Lotsize)
cor(Correl2) %>% stargazer(type="html", title = "Corrlation Matrix2", out="Correlation Maxtric without dummies.html")
Does anyone know why I am missing the correlation values for the first three ones? I have run the exact same code before but received the values.
structure(list(Price = c(6300000, 1.2e 07, 10700000, 11450000,
10200000, 9500000), Price2 = c(6300000, 1.2e 07, 10700000, 11450000,
10200000, 9500000), Price.m2 = c(35000, 43636.36364, 65644.17178,
68975.90361, 52849.74093, 44811.32075), Listingprice2 = c(6500000,
1.3e 07, 10600000, 12200000, 10500000, 9800000), Listingdays = c(12,
0, 9, 134, 109, 234), Type = c(0, 0, 0, 0, 0, 0), Age = c(100,
42, 102, 8, 33, 37), BRA = c(180, 275, 163, 166, 193, 212), Balcony = c(1,
1, 1, 1, 1, 1), Bedrooms = c(4, 5, 4, 4, 5, 5), Lotsize = c(1109,
859.7, 688.6, 1469, 700.2, 1691)), row.names = c(NA, 6L), class = "data.frame")
CodePudding user response:
I think there are two things going on here. But someone more expert might have a better answer.
Two of these variables I don't think should be included in the correlation matrix at all because they are derived variables from each other. Price2
is the same as Price
. Also, Price.m2
seems to be Price / BRA
.
Another thing to consider - Type
and Balcony
look like dummy variables encoded with a 0 or 1, and and maybe Bedrooms
looks like it should be factor / categorical variable. Perhaps try leaving some of these variables out of the matrix and using this instead. I would not anticipate a problem here.
Correl2 <- Data %>%
dplyr::select(Price, Listingprice2, Listingdays, Age, BRA, Lotsize)
I would love to hear a more technical answer about why the code here is failing, or else a more statistically rigorous answer about the nuance of using the Pearson correlation against a variable that is either a dummy variable or a variable with very few levels - I just have to leave that part open to the next answer.
CodePudding user response:
Thanks for all the answers! I tried to omit NA, and now there is no problem with the correlations. However, it is kind of wired, since when I try to look for missing values in my Excel file I can't see to find any or see any. I guess that was just the problem.