Im trying to run three different regressions based on the data i was provided with. The idea is to understand how sweetness is linked to bitterness, but that for the three different of ciders we have: {dry, semidry, sweet}.
What i had in mind was to first make a scatter plot with all the x values, regardless of the kind of cider we have, and then make three different regression models, based on three different 'sliced' panda df, x_dry, x_semidry and x_sweet.
I get an error line 20, saying that im essentially multiplying an int with a numpy array. So to start solving my problem, i tried to list(myarray). However the error persists. Can someone point me in the right direction here?
the error i get:
TypeError Traceback (most recent call last)
<ipython-input-138-fa4209c61f04> in <module>
20 return slope*i intercept
21
---> 22 mymodeldry = list(map(myfunc, x_dry))
23 mymodelsemidry = list(map(myfunc, x_semidry))
24 mymodelsweet = list(map(myfunc, x_sweet))
<ipython-input-138-fa4209c61f04> in myfunc(i)
18
19 def myfunc(i):
---> 20 return slope*i intercept
21
22 mymodeldry = list(map(myfunc, x_dry))
TypeError: can't multiply sequence by non-int of type 'numpy.float64'
My code
# d) Represent the sweet flavor according to the bitter flavor and add the regression line, for each type of cider.
x = list(ciderdf["Sweetness"])
x_dry = ciderdf[ciderdf["Type"]=="Dry"]
x_dry = list(x_dry[["Sweetness"]])
x_semidry = ciderdf[ciderdf["Type"]=="Semi-dry"]
x_semidry = list(x_semidry[["Sweetness"]])
x_sweet = ciderdf[ciderdf["Type"]=="Sweet"]
x_sweet = list(x_sweet[["Sweetness"]])
y = list(ciderdf["Bitterness"])
slope, intercept, r, p, std_err = stats.linregress(x, y)
slope, intercept, r, p, std_err = stats.linregress(x, y)
slope, intercept, r, p, std_err = stats.linregress(x, y)
def myfunc(i):
return slope*i intercept
mymodeldry = list(map(myfunc, x_dry))
mymodelsemidry = list(map(myfunc, x_semidry))
mymodelsweet = list(map(myfunc, x_sweet))
plt.scatter(x,y)
plt.plot(x, mymodeldry, mymodelsemidry, mymodelsweet)
plt.show()
CodePudding user response:
The error is in x_dry = list(x_dry[["Sweetness"]])
. When you do this, you return x_dry = ['Sweetness']
.
So in the map function you are essentially multiplying a float slope
with a string, which raises the TypeError: can't multiply sequence by non-int of type 'numpy.float64'
.
Do this instead: x_dry = x_dry["Sweetness"].tolist()
.
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
x_dry = df["a"].tolist()
print(x_dry)
def myfunc(i):
return 3.14*i 2.56
mymodeldry = list(map(myfunc, x_dry))
print(mymodeldry)
Output:
[1, 2, 3]
[5.7, 8.84, 11.98]
CodePudding user response:
You're actually multiplying a list by a float, which is not possible.
That is in fact possible only with int
since multiplying a list by a number n
repeats the list n
times.
I think you wanted to multiply each element of the list by a float value, which you can achieve by transforming the list x_dry = list(x_dry[["Sweetness"]])
in a numpy array x_dry = np.array(list(x_dry[["Sweetness"]]))