Home > OS >  Why is my variables linearly dependent? Regression, Diff-n-diff, interaction term and dummies
Why is my variables linearly dependent? Regression, Diff-n-diff, interaction term and dummies

Time:10-12

I've created a small dataframe for testing differences-in-differences, in order to gain some intuition about the method and theory. I guess I have two questions.

  1. Why is the correlation between free_cookies and free_cookies*teenager = 1?
  2. Is there a way to fix the data so that the regression lm(cookies_eaten ~ teenager free_cookies teenager*free_cookies, data), does not drop the interaction term(free_cookies*teenager)?

It should be possible to run a regression with the format

outcome ~ dummy1   dummy2   dummy1*dummy2

and get coefficient estimates for all independent variables, which I've seen work elsewhere. To be clear: teenager and free_cookies are dummy variables. I'm guessing I've just done something silly when I constructed my sample data.

# cookie eating data
data <- read.table(text = "

year    cookies_eaten   teenager    free_cookies
2000    110 1   0
2001    110 1   0
2002    120 1   0
2003    120 1   0
2004    125 1   0
2005    125 1   0
2006    125 1   0
2007    145 1   1
2008    155 1   1
2009    160 1   1
2010    160 1   1
2000    100 0   0
2001    100 0   0
2002    110 0   0
2003    110 0   0
2004    115 0   0
2005    115 0   0
2006    115 0   0
2007    115 0   0
2008    115 0   0
2009    120 0   0
2010    120 0   0", header=TRUE)


# Regressions
one <- lm(cookies_eaten ~ teenager, data)
summary(one)

two <- lm(cookies_eaten ~ teenager   free_cookies, data)
summary(two)

three <- lm(cookies_eaten ~ teenager   free_cookies   teenager*free_cookies, data)
summary(three) # Coefficients: (1 not defined because of singularities)

# four without free_cookies
four <- lm(cookies_eaten ~ teenager   teenager*free_cookies, data)
summary(four) # Coefficients: (1 not defined because of singularities)

# Corrolation testing
attach(data)
cor(free_cookies, free_cookies*teenager, method = c("pearson", "kendall", "spearman"))
# = 1
cor(cookies_eaten, free_cookies*teenager, method = c("pearson", "kendall", "spearman"))
# = 0.9090648
detach(data)

CodePudding user response:

Looking at the data one can easily see that whenever teenager == 0 there is also free_cookies==0 So these data are in perfect alignment. When teenager==1 every value of free_cookies is multiplied by 1 so that does not change anything on free_cookies so that is why free_cookies and teenager times free_cookies is always the same value so the correlation is 1. With these data you cannot investigate interactions. You need to sample some data where teenager == 0 and free_cookies ==1.

data <- read.table(text = "
year    cookies_eaten   teenager    free_cookies
2000    110 1   0
2001    110 1   0
2002    120 1   0
2003    120 1   0
2004    125 1   0
2005    125 1   0
2006    125 1   0
2007    145 1   1
2008    155 1   1
2009    160 1   1
2010    160 1   1
2000    100 0   0
2001    100 0   0
2002    110 0   0
2003    110 0   0
2004    115 0   0
2005    115 0   0
2006    115 0   0
2007    115 0   0
2008    115 0   0
2009    120 0   0
2010    120 0   0", header=TRUE)

data$interaction <- data$teenager * data$free_cookies

print(data[, c("free_cookies", "interaction")])

any(data$free_cookies != data$interaction)
  • Related