How to fix this for loop problem in python?-CodePudding

Dataset:

df = pd.read_excel('sales.xlsx')
x = df['Temperature (°C)']
y = df['Ice Cream Sales ($)']
df

I made another table for regression squared error and used for loop to perform the y - ŷᵢ formula using this code:

sse = pd.DataFrame()
sse['Temperature (x)'] = x
sse['Ice Cream Sales (y)'] = y
sse['Predicted Ice Cream Sales (ŷᵢ)'] = [i for i in re]
sse['Error (y - ŷᵢ)'] = [i - round(re, 4) for i in y]
sse

And the output looks like this:

Desired output should look like this:

The problem is when using the for loop, it reads every value of y as supposed to just the corresponding value in each row or index.

Btw here's how I did the re(regression equation) as the problem might be here as to why the for loop included every y value

re = [round(round(slope, 4) * i   round(itc, 4), 4) for i in x]
re = pd.Series(re)
re

My regression equation for this dataset is:

CodePudding user response：

Use vectorization, and avoid creating unnecessary intermediate objects (like [i for i in re]).

sse = pd.DataFrame(
    {
        'Temperature (x)': x,
        'Ice Cream Sales (y)': y,
        'Predicted Ice Cream Sales (ŷᵢ)': re,
        'Error (y - ŷᵢ)': y - re.round(4),
    }
)

CodePudding user response：

Did you try something like this:

sse['Predicted Ice Cream Sales (ŷᵢ)'] = pd.Series([round(round(slope, 4) * i   round(itc, 4), 4) for i in x])
sse['Error (y - ŷᵢ)'] = y - round(sse['Predicted Ice Cream Sales (ŷᵢ)'], 4)

Result looks like this: