I am currently creating a sensor that measures the salinity of water in ppm. I've gathered my data and I'm trying to fit a curve that will accurately represent the data to make predictions. The only problem is, no matter how I change the code, the curve is still unrepresentative of the data. Is there any way to fix this?
Here's what I've got so far.
import numpy
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
deg = 4
x = [26500, 18550, 12600, 8000, 4760, 3800, 3140, 2810, 2580, 2550, 2540]
y = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000]
mymodel = numpy.poly1d(numpy.polyfit(x, y, deg))
myline = numpy.linspace(26500, 2540, 40)
plt.title("R Squared = " str(r2_score(y, mymodel(x))))
plt.scatter(x, y)
plt.plot(myline, mymodel(myline))
plt.show()
And here's the data:
x = [26500, 18550, 12600, 8000, 4760, 3800, 3140, 2810, 2580, 2550, 2540]
y = [0, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000]
x is the sensor readout.
y is the salinity of the solution in ppm that caused the readout.
Thanks for the help!
CodePudding user response:
Looks like you're always going to be overfitting your data, i.e. your r2 includes the data you used to fit the model. Also, after plotting your data, it looks like it isn't a polynomial relationship, and you're trying to fit it to a polynomial relationship. Try a log relationship or some other relationship or model your data using some other non parametric technique.