Home > Mobile >  Logistic Regression in R using a loop to save code
Logistic Regression in R using a loop to save code

Time:09-27

Using R to do some logistic regression using the Boston crime dataset. This code works just fine:

#################################

library(MASS)
head(Boston)
?Boston

plot(Boston$zn, Boston$crim) #gives scatter plot
lm(formula=Boston$crim~Boston$zn, data=Boston) #gives slope and intercept of best fit line
lm.Boston <-lm(formula=Boston$crim~Boston$zn, data=Boston) #saves information as lm.Boston
abline(lm.Boston) #plots best fit line Adds on to existing plot
abline(v=mean(Boston$zn),col='red') #plots mean for crim
abline(h=mean(Boston$crim),col='red') #plots mean for zn
summary(Boston$zn)

###############################

But I have to replace the $zn with 13 other variable values and I am trying to do it in a loop to save having to repeat the code block 13 times!

Tying this, but get an error

for (i in 2:ncol(Boston)){
   clname <- colnames(Boston)[i]
   predictor <- paste('Boston$',clname,sep="")
   print(predictor)
   plot(eval(predictor), Boston$crim) #gives scatter plot
# lm(formula=Boston$crim~predictor, data=Boston) #gives slope and intercept of best fit line
# lm.Boston <-lm(formula=Boston$crim~predictor, data=Boston) #saves information as lm.Boston
# abline(lm.Boston) #plots best fit line Adds on to existing plot
# abline(v=mean(predictor),col='red') #plots mean for crim
# abline(h=mean(Boston$crim),col='red') #plots mean for clname

}

The predictor variable seems to be correct when I print it out, but the first plot statement gives an error (commented out the rest of the code to try and fix this error.

Here is the error I get:

[1] "Boston$zn" Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ

CodePudding user response:

You can store the column names in a separate list and then iterate over it. Or you can directly use it in a for loop. I have stored it here in a separate list.

After that, you need to add the proper labels with paste0:

library(MASS)
columns_boston <- colnames(Boston)[2:ncol(Boston)]

for (i in columns_boston){
  predictor <- Boston[,i]
  print(predictor)
  plot(predictor, Boston$crim, xlab=paste0(i), ylab=paste0('crim')) #gives scatter plot
  lm(formula=Boston$crim~predictor, data=Boston) #gives slope and intercept of best fit line
  lm.Boston <-lm(formula=Boston$crim~predictor, data=Boston) #saves information as lm.Boston
  abline(lm.Boston) #plots best fit line Adds on to existing plot
  abline(v=mean(predictor),col='red') #plots mean for crim
  abline(h=mean(Boston$crim),col='red') #plots mean for clname
}

Sample output for the last column:

Output

You can remove the ylab if you want.

CodePudding user response:

What you want to do (13 different charts) you can do like this

library(tidyverse)
library(MASS)

plotVar = function(data, name) data %>% ggplot(aes(crim, val)) 
  geom_point() 
  stat_smooth(formula =y~x, method="glm") 
  ylab(name)

Boston %>% pivot_longer(
  -crim, names_to = "var", values_to = "val"
) %>% group_by(var) %>% 
  nest() %>% 
  group_map(~plotVar(.x$data[[1]], .y))

First plot enter image description here Last plot enter image description here This, however, is not a logistic regression! You have to specify exactly what you want to achieve.

  •  Tags:  
  • r
  • Related