Home > Enterprise >  passing panda dataframe data to functions and its not outputting the results
passing panda dataframe data to functions and its not outputting the results

Time:08-10

In my code, I am trying to extract data from csv file to use in the function, but it doesnt output anything, and gives no error. My code works because I tried it with just numpy array as inputs. not sure why it doesnt work with panda.

import numpy as np
import pandas as pd
import os

# change the current directory to the directory where the running script file is
os.chdir(os.path.dirname(os.path.abspath(__file__)))

# finding best fit line for y=mx b by iteration
def gradient_descent(x,y):
    m_iter = b_iter = 1 #starting point
    iteration = 10000
    n = len(x)
    learning_rate = 0.05

    last_mse = 10000

    #take baby steps to reach global minima
    for i in range(iteration):

        y_predicted = m_iter*x   b_iter

        #mse = 1/n*sum([value**2 for value in (y-y_predicted)]) # cost function to minimize
        mse = 1/n*sum((y-y_predicted)**2) # cost function to minimize
        if (last_mse - mse)/mse < 0.001:
            break

        # recall MSE formula is 1/n*sum((yi-y_predicted)^2), where y_predicted = m*x b
        # using partial deriv of MSE formula, d/dm and d/db 
        dm = -(2/n)*sum(x*(y-y_predicted))
        db = -(2/n)*sum((y-y_predicted))

        # use current predicted value to get the next value for prediction
        # by using learning rate
        m_iter = m_iter - learning_rate*dm 
        b_iter = b_iter - learning_rate*db
        print('m is {}, b is {}, cost is {}, iteration {}'.format(m_iter,b_iter,mse,i))

        last_mse = mse

#x = np.array([1,2,3,4,5])
#y = np.array([5,7,8,10,13])
#gradient_descent(x,y)

df = pd.read_csv('Linear_Data.csv')
x = df['Area']
y = df['Price']
gradient_descent(x,y)

CodePudding user response:

My code works because I tried it with just numpy array as inputs. not sure why it doesnt work with panda.

Well no, your code also works with pandas dataframes:

df = pd.DataFrame({'Area': [1,2,3,4,5], 'Price': [5,7,8,10,13]})
x = df['Area']
y = df['Price']
gradient_descent(x,y)

Above will give you the same output as with numpy arrays.

Try to check what's in Linear_Data.csv and/or add some print statements in the gradient_descent function just to check your assumptions. I would suggest to first of all add a print statement before the condition with the break statement:

print(last_mse, mse)
if (last_mse - mse)/mse < 0.001:
    break
  • Related