Home > Net >  Regression fitting incorrectly Matplotlib
Regression fitting incorrectly Matplotlib

Time:08-18

I'm trying to add a regression line to this dataset using matplotlib.

Country             GDP per capita  Life satisfaction
Russia              9054.914        6
Turkey              9437.372        5.6
Hungary             12239.894       4.9
Poland              12495.334       5.8
Slovak Republic     15991.736       6.1
Estonia             17288.083       5.6
Greece              18064.288       4.8
Portugal            19121.592       5.1
Slovenia            20732.482       5.7
Spain               25864.721       6.5
Korea               27195.197       5.8
Italy               29866.581       6
Japan               32485.545       5.9
Israel              35343.336       7.4
New Zealand         37044.891       7.3
France              37675.006       6.5
Belgium             40106.632       6.9
Germany             40996.511       7
Finland             41973.988       7.4
Canada              43331.961       7.3
Netherlands         43603.115       7.3
Austria             43724.031       6.9
United Kingdom      43770.688       6.8
Sweden              49866.266       7.2
Iceland             50854.583       7.5
Australia           50961.865       7.3
Ireland             51350.744       7
Denmark             52114.165       7.5
United States       55805.204       7.2

but when I plot the slope and intercept per this example - enter image description here

Is there something improper I am doing that that is causing the poor fit?

I know I could address this in a few lines by using seaborn instead:

import seaborn as sns

sns.regplot(X,y,ci=None)

but I'd like to understand the underlying reason for the poor fit.

CodePudding user response:

The problem here is that you only fit your line to one point, that is, the first point X[0], y[0]. So you can just write

m, b = np.polyfit(X[:, 0], y[:, 0], 1)

or more cleanly remove the dimensions you unnecessarily added at the start and write

X = country_stats["GDP per capita"]
y = country_stats["Life satisfaction"]
...
m, b = np.polyfit(X, y, 1)
  • Related