I've created a small dataframe for testing differences-in-differences, in order to gain some intuition about the method and theory. I guess I have two questions.
- Why is the correlation between free_cookies and free_cookies*teenager = 1?
- Is there a way to fix the data so that the regression lm(cookies_eaten ~ teenager free_cookies teenager*free_cookies, data), does not drop the interaction term(free_cookies*teenager)?
It should be possible to run a regression with the format
outcome ~ dummy1 dummy2 dummy1*dummy2
and get coefficient estimates for all independent variables, which I've seen work elsewhere. To be clear: teenager and free_cookies are dummy variables. I'm guessing I've just done something silly when I constructed my sample data.
# cookie eating data
data <- read.table(text = "
year cookies_eaten teenager free_cookies
2000 110 1 0
2001 110 1 0
2002 120 1 0
2003 120 1 0
2004 125 1 0
2005 125 1 0
2006 125 1 0
2007 145 1 1
2008 155 1 1
2009 160 1 1
2010 160 1 1
2000 100 0 0
2001 100 0 0
2002 110 0 0
2003 110 0 0
2004 115 0 0
2005 115 0 0
2006 115 0 0
2007 115 0 0
2008 115 0 0
2009 120 0 0
2010 120 0 0", header=TRUE)
# Regressions
one <- lm(cookies_eaten ~ teenager, data)
summary(one)
two <- lm(cookies_eaten ~ teenager free_cookies, data)
summary(two)
three <- lm(cookies_eaten ~ teenager free_cookies teenager*free_cookies, data)
summary(three) # Coefficients: (1 not defined because of singularities)
# four without free_cookies
four <- lm(cookies_eaten ~ teenager teenager*free_cookies, data)
summary(four) # Coefficients: (1 not defined because of singularities)
# Corrolation testing
attach(data)
cor(free_cookies, free_cookies*teenager, method = c("pearson", "kendall", "spearman"))
# = 1
cor(cookies_eaten, free_cookies*teenager, method = c("pearson", "kendall", "spearman"))
# = 0.9090648
detach(data)
CodePudding user response:
Looking at the data one can easily see that whenever teenager == 0
there is also free_cookies==0
So these data are in perfect alignment. When teenager==1
every value of free_cookies
is multiplied by 1
so that does not change anything on free_cookies
so that is why free_cookies
and teenager times free_cookies
is always the same value so the correlation is 1
. With these data you cannot investigate interactions. You need to sample some data where teenager == 0 and free_cookies ==1
.
data <- read.table(text = "
year cookies_eaten teenager free_cookies
2000 110 1 0
2001 110 1 0
2002 120 1 0
2003 120 1 0
2004 125 1 0
2005 125 1 0
2006 125 1 0
2007 145 1 1
2008 155 1 1
2009 160 1 1
2010 160 1 1
2000 100 0 0
2001 100 0 0
2002 110 0 0
2003 110 0 0
2004 115 0 0
2005 115 0 0
2006 115 0 0
2007 115 0 0
2008 115 0 0
2009 120 0 0
2010 120 0 0", header=TRUE)
data$interaction <- data$teenager * data$free_cookies
print(data[, c("free_cookies", "interaction")])
any(data$free_cookies != data$interaction)