I'm studying the returns to college admission for marginal student and i'm trying to make a ggplot2 of the following data which is, average salaries of students who finished or didn't finish their masters in medicin and the average 'GPA' (foreign equivalent) distance to the 'acceptance score':
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
I have to do a Regression Discontinuity Design (RDD), so to do the regression - as far as i understand it - i have to rewrite the DistanceGrades to numeric so i just created a variable z
z <- -5:4
where 0 is the cutoff (ie. 0 is equal to "0.0" in DistanceGrades). I then make a dataframe
df <- data.frame(z,SalaryAfter)
Now my attempt to create the plot gets a bit messy (i use the package 'fpp3', but i suppose that it is just the ggplot2 and maybe dyplr packages)
df %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0))) %>%
ggplot(aes(x = z, y = SalaryAfter, color = D))
geom_point(stat = "identity")
geom_smooth(method = "lm")
geom_vline(xintercept = 0)
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1))
xlim(-6,5)
xlab("Distance to acceptance score")
labs(title = "Figur 1.1", subtitle = "Salary for every distance to the acceptance score")
Which plots:
What i'm trying to do is firstly, split the data with a dummy variable D=1 if z>0 and D=0 if z<0. Then i plot it with a linear regression and a vertical line at z=0. Lastly i write the title and subtilte. Now i have two problems:
- The x axis is displaying -5, -2.5, ... but i would like for it to show all the integers, the rational numbers have no relation to the z variable which is discrete. I have tried to fix this with several different methods, but none of them have worked, i can't remember all the ways i have tried (theme(panel.grid...),scale_x_discrete and many more), but the outcome has all been pretty similar. They all cause the x-axis to be completely removed such that there is no numbers and sometimes it even removes the axis title.
- i would like for the regression channel for the first part of the data to extend to z=0
When i try to solve both of these problems i again get similar results, most of the things i try is not producing an error message when i run the code, but they either do nothing to my plot or they remove some of the existing elements which leaves me made of questions. I suppose that the error is caused by some of the elements not working together but i have no idea.
CodePudding user response:
Try this:
library(tidyverse)
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
z <- -5:4
df <- data.frame(z,SalaryAfter) %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0)))
# Fit a lm model for the left part of the panel
fit_data <- lm(SalaryAfter~z, data = filter(df, z <= -0.1)) %>%
predict(., newdata = data.frame(z = seq(-5, 0, 0.1)), interval = "confidence") %>%
as.data.frame() %>%
mutate(z = seq(-5, 0, 0.1), D = factor(0, levels = c(0, 1)))
# Plot
ggplot(mapping = aes(color = D))
geom_ribbon(data = filter(fit_data, z <= 0 & -1 <= z),
aes(x = z, ymin = lwr, ymax = upr),
fill = "grey70", color = "transparent", alpha = 0.5)
geom_line(data = fit_data, aes(x = z, y = fit), size = 1)
geom_point(data = df, aes(x = z, y = SalaryAfter), stat = "identity")
geom_smooth(data = df, aes(x = z, y = SalaryAfter), method = "lm")
geom_vline(xintercept = 0)
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1))
scale_x_continuous(limits = c(-6, 5), breaks = -6:5)
xlab("Distance to acceptance score")
labs(title = "Figure 1.1", subtitle = "Salary for every distance to the acceptance score")