What does the data points on the linear regression graph represents? The distance of data points is the error? What is the use of that? The distance?
CodePudding user response:
These data points are the data upon which the model was trained on. By the distance of data points you probably mean the distance between each data point and the red line. That is the error for each point and the sum of it is what is being minimized in order to fit the line. The line is called the best-fit line, and is therefore the line that minimizes the mean sum of the squared errors.
A more complete image of these distances is the following (for some data x and y, the distances are the dotted lines):