I am trying to create a scatter plot of earnings vs education for my statistical models class but it says "invalid character in identifier" but when I check on the txt file the characters "earnings" and "education" are both present. Could you help me please?
mod = smf.ols(formula=’education~earnings, data=mydata)
res = mod.fit()
res.summary()
beta=res.params
matplotlib.pyplot.scatter(mydata["education"],mydata["earnings"],color="black")
matplotlib.pyplot.plot(mydata["education"], res.fittedvalues, "r")
matplotlib.pyplot.ylabel("earnings")
matplotlib.pyplot.xlabel("education")
matplotlib.pyplot.title("Scatterplot earnings versus education")
matplotlib.pyplot.show()
CodePudding user response:
I think the issue is the quotation mark after the =
on this line:
mod = smf.ols(formula=’education~earnings, data=mydata)
This is confusing Python as it's not a valid variable name. The formula should be passed as a string, with a opening & closing single/double quote.
mod = smf.ols(formula='education~earnings', data=mydata)
Perhaps something got mixed up when copy-pasting it?