I'm trying to find y = mx b for a variety of different datasets. I've tried using:
slope_1, intercept_1 = linregress(values_1)
where values_1
is a Series type data.
bin_1 | values |
---|---|
5th_per | 10 |
25th_per | 24 |
50th_per | 28 |
75th_per | 34 |
90th_per | 50 |
95th_per | 65 |
However, whenever I try to run the code, I get the error, IndexError: tuple index out of range
.
I sort of understand the error, but am not sure how to fix this. Any suggestions on how to go about this or any other ways of finding the linear regression?
CodePudding user response:
Assuming values
are the y values and the indices are the x values (e.g. values = 10
has x = 0
, values = 24
has x = 1
, etc.), then you can do:
values_1 = df.tolist("values")
# Convert values_1 to a numpy array and get x values as indices.
y = np.array(values_1, dtype=float)
x = np.arange(0, len(y), 1)
soln = np.polyfit(x, y, 1)
m, b = soln[0], soln[1]
Let me know if you have any questions.
EDIT: If you want to use the bin_1
values for the x values, replace the line x = np.arange
in the code above with the following:
# Split each string according to _, creating a list for each string with the
# 1st element containing the number and the 2nd element being "per".
bins = df["bin_1"].str.split("_")
# Get the 1st element in each list using row[0].
# Then access just the number by discarding the ending "th" in each row[0].
bins = [row[0][:-2] for row in bins]
x = np.array(bins, dtype=float)