Using R to do some logistic regression using the Boston crime dataset. This code works just fine:
#################################
library(MASS)
head(Boston)
?Boston
plot(Boston$zn, Boston$crim) #gives scatter plot
lm(formula=Boston$crim~Boston$zn, data=Boston) #gives slope and intercept of best fit line
lm.Boston <-lm(formula=Boston$crim~Boston$zn, data=Boston) #saves information as lm.Boston
abline(lm.Boston) #plots best fit line Adds on to existing plot
abline(v=mean(Boston$zn),col='red') #plots mean for crim
abline(h=mean(Boston$crim),col='red') #plots mean for zn
summary(Boston$zn)
###############################
But I have to replace the $zn with 13 other variable values and I am trying to do it in a loop to save having to repeat the code block 13 times!
Tying this, but get an error
for (i in 2:ncol(Boston)){
clname <- colnames(Boston)[i]
predictor <- paste('Boston$',clname,sep="")
print(predictor)
plot(eval(predictor), Boston$crim) #gives scatter plot
# lm(formula=Boston$crim~predictor, data=Boston) #gives slope and intercept of best fit line
# lm.Boston <-lm(formula=Boston$crim~predictor, data=Boston) #saves information as lm.Boston
# abline(lm.Boston) #plots best fit line Adds on to existing plot
# abline(v=mean(predictor),col='red') #plots mean for crim
# abline(h=mean(Boston$crim),col='red') #plots mean for clname
}
The predictor variable seems to be correct when I print it out, but the first plot statement gives an error (commented out the rest of the code to try and fix this error.
Here is the error I get:
[1] "Boston$zn" Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ
CodePudding user response:
You can store the column names in a separate list and then iterate over it. Or you can directly use it in a for
loop. I have stored it here in a separate list.
After that, you need to add the proper labels with paste0
:
library(MASS)
columns_boston <- colnames(Boston)[2:ncol(Boston)]
for (i in columns_boston){
predictor <- Boston[,i]
print(predictor)
plot(predictor, Boston$crim, xlab=paste0(i), ylab=paste0('crim')) #gives scatter plot
lm(formula=Boston$crim~predictor, data=Boston) #gives slope and intercept of best fit line
lm.Boston <-lm(formula=Boston$crim~predictor, data=Boston) #saves information as lm.Boston
abline(lm.Boston) #plots best fit line Adds on to existing plot
abline(v=mean(predictor),col='red') #plots mean for crim
abline(h=mean(Boston$crim),col='red') #plots mean for clname
}
Sample output for the last column:
You can remove the ylab
if you want.
CodePudding user response:
What you want to do (13 different charts) you can do like this
library(tidyverse)
library(MASS)
plotVar = function(data, name) data %>% ggplot(aes(crim, val))
geom_point()
stat_smooth(formula =y~x, method="glm")
ylab(name)
Boston %>% pivot_longer(
-crim, names_to = "var", values_to = "val"
) %>% group_by(var) %>%
nest() %>%
group_map(~plotVar(.x$data[[1]], .y))
First plot Last plot This, however, is not a logistic regression! You have to specify exactly what you want to achieve.