Home > Net >  Struggling with correlation matrix, missing values
Struggling with correlation matrix, missing values

Time:03-31

enter image description here

This is the correlation matrix I get when I am running this code in r

Correl2 <- Data %>% select(Price, Price2, Price.m2, Listingprice2, Listingdays, Type, Age, BRA, Balcony, Bedrooms,Lotsize)
cor(Correl2) %>% stargazer(type="html", title = "Corrlation Matrix2", out="Correlation Maxtric without dummies.html")

Does anyone know why I am missing the correlation values for the first three ones? I have run the exact same code before but received the values.

structure(list(Price = c(6300000, 1.2e 07, 10700000, 11450000, 
10200000, 9500000), Price2 = c(6300000, 1.2e 07, 10700000, 11450000, 
10200000, 9500000), Price.m2 = c(35000, 43636.36364, 65644.17178, 
68975.90361, 52849.74093, 44811.32075), Listingprice2 = c(6500000, 
1.3e 07, 10600000, 12200000, 10500000, 9800000), Listingdays = c(12, 
0, 9, 134, 109, 234), Type = c(0, 0, 0, 0, 0, 0), Age = c(100, 
42, 102, 8, 33, 37), BRA = c(180, 275, 163, 166, 193, 212), Balcony = c(1, 
1, 1, 1, 1, 1), Bedrooms = c(4, 5, 4, 4, 5, 5), Lotsize = c(1109, 
859.7, 688.6, 1469, 700.2, 1691)), row.names = c(NA, 6L), class = "data.frame")

CodePudding user response:

I think there are two things going on here. But someone more expert might have a better answer.

Two of these variables I don't think should be included in the correlation matrix at all because they are derived variables from each other. Price2 is the same as Price. Also, Price.m2 seems to be Price / BRA.

Another thing to consider - Type and Balcony look like dummy variables encoded with a 0 or 1, and and maybe Bedrooms looks like it should be factor / categorical variable. Perhaps try leaving some of these variables out of the matrix and using this instead. I would not anticipate a problem here.

Correl2 <- Data %>% 
  dplyr::select(Price, Listingprice2, Listingdays, Age, BRA, Lotsize)

I would love to hear a more technical answer about why the code here is failing, or else a more statistically rigorous answer about the nuance of using the Pearson correlation against a variable that is either a dummy variable or a variable with very few levels - I just have to leave that part open to the next answer.

CodePudding user response:

Thanks for all the answers! I tried to omit NA, and now there is no problem with the correlations. However, it is kind of wired, since when I try to look for missing values in my Excel file I can't see to find any or see any. I guess that was just the problem.

  • Related