I need to run a regression for a dataframe in which one variable (like_count
) only has positive values. The following df is a simplified version of my data with the min. and max. values from my data:
like_count <- c(631827, 0, 0, 4012)
news_media <- c("ABC", "ABC", "NZZ", "CNN")
data <- data.frame(news_media, like_count)
How do I correctly calculate a regression for this data frame? I want to predict the like_count
depending on the news_media
.
So far, I tried the following:
model <- lm(log(like_count) ~ news_media, data = data)
summary(model)
This leads to an error, because I receive -INF values with log(like_count)
.
Does anybody have an idea what I can do to run a correct regression?
CodePudding user response:
You get this error because when you used log function to like_count
, 0 values return to -Inf.
First of all, create new variable with taking logarithm of like_count
.
log_like_count = log(like_count)
then you can change -Inf values to 0's by using ifelse
function from base R.
log_like_count = ifelse(log_like_count == "-Inf", 0, log_like_count)
After that, you can add a new variable to your data frame and run the model again without using the log function.
data$log_like_count = log_like_count
model = lm(log_like_count ~news_media, data = data)
summary(model)