Home > Back-end >  Run three regression models based on a label
Run three regression models based on a label

Time:02-10

Im trying to run three different regressions based on the data i was provided with. The idea is to understand how sweetness is linked to bitterness, but that for the three different of ciders we have: {dry, semidry, sweet}.

What i had in mind was to first make a scatter plot with all the x values, regardless of the kind of cider we have, and then make three different regression models, based on three different 'sliced' panda df, x_dry, x_semidry and x_sweet.

I get an error line 20, saying that im essentially multiplying an int with a numpy array. So to start solving my problem, i tried to list(myarray). However the error persists. Can someone point me in the right direction here?

the error i get:

TypeError                                 Traceback (most recent call last)
<ipython-input-138-fa4209c61f04> in <module>
     20     return slope*i   intercept
     21 
---> 22 mymodeldry = list(map(myfunc, x_dry))
     23 mymodelsemidry = list(map(myfunc, x_semidry))
     24 mymodelsweet = list(map(myfunc, x_sweet))

<ipython-input-138-fa4209c61f04> in myfunc(i)
     18 
     19 def myfunc(i):
---> 20     return slope*i   intercept
     21 
     22 mymodeldry = list(map(myfunc, x_dry))

TypeError: can't multiply sequence by non-int of type 'numpy.float64'

My code

# d) Represent the sweet flavor according to the bitter flavor and add the regression line, for each type of cider.
x = list(ciderdf["Sweetness"])

x_dry = ciderdf[ciderdf["Type"]=="Dry"]
x_dry = list(x_dry[["Sweetness"]])

x_semidry = ciderdf[ciderdf["Type"]=="Semi-dry"]
x_semidry = list(x_semidry[["Sweetness"]])

x_sweet = ciderdf[ciderdf["Type"]=="Sweet"]
x_sweet = list(x_sweet[["Sweetness"]])

y = list(ciderdf["Bitterness"])

slope, intercept, r, p, std_err = stats.linregress(x, y)
slope, intercept, r, p, std_err = stats.linregress(x, y)
slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(i):
    return slope*i   intercept

mymodeldry = list(map(myfunc, x_dry))
mymodelsemidry = list(map(myfunc, x_semidry))
mymodelsweet = list(map(myfunc, x_sweet))

plt.scatter(x,y)
plt.plot(x, mymodeldry, mymodelsemidry, mymodelsweet)
plt.show()

CodePudding user response:

The error is in x_dry = list(x_dry[["Sweetness"]]). When you do this, you return x_dry = ['Sweetness'].

So in the map function you are essentially multiplying a float slope with a string, which raises the TypeError: can't multiply sequence by non-int of type 'numpy.float64'.

Do this instead: x_dry = x_dry["Sweetness"].tolist().

df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
x_dry = df["a"].tolist()
print(x_dry)

def myfunc(i):
    return 3.14*i   2.56

mymodeldry = list(map(myfunc, x_dry))
print(mymodeldry)

Output:

[1, 2, 3]
[5.7, 8.84, 11.98]

CodePudding user response:

You're actually multiplying a list by a float, which is not possible. That is in fact possible only with int since multiplying a list by a number n repeats the list n times.

I think you wanted to multiply each element of the list by a float value, which you can achieve by transforming the list x_dry = list(x_dry[["Sweetness"]]) in a numpy array x_dry = np.array(list(x_dry[["Sweetness"]]))

  • Related