Home > database >  Regression in R for data frame with only positive values
Regression in R for data frame with only positive values

Time:12-02

I need to run a regression for a dataframe in which one variable (like_count) only has positive values. The following df is a simplified version of my data with the min. and max. values from my data:

like_count <- c(631827, 0, 0, 4012)
  
news_media <- c("ABC", "ABC", "NZZ", "CNN")

data <- data.frame(news_media, like_count)

How do I correctly calculate a regression for this data frame? I want to predict the like_count depending on the news_media .

So far, I tried the following:

model <- lm(log(like_count) ~ news_media, data = data)

summary(model)

This leads to an error, because I receive -INF values with log(like_count).

Does anybody have an idea what I can do to run a correct regression?

CodePudding user response:

You get this error because when you used log function to like_count, 0 values return to -Inf.

First of all, create new variable with taking logarithm of like_count.

log_like_count = log(like_count)

then you can change -Inf values to 0's by using ifelse function from base R.

log_like_count = ifelse(log_like_count == "-Inf", 0, log_like_count)  

After that, you can add a new variable to your data frame and run the model again without using the log function.

data$log_like_count = log_like_count  
model = lm(log_like_count ~news_media, data = data)  
summary(model)  
  • Related