Home > Blockchain >  Dataframe Is No Longer Accessible
Dataframe Is No Longer Accessible

Time:07-27

I am trying to make my code look better and create functions that do all the work from running just one line but it is not working as intended. I am currently pulling data from a pdf that is in a table into a pandas dataframe. From there I have 4 functions, all calling each other and finally returning the updated dataframe. I can see that it is full updated when I print it in the last method. However I am unable to access and use that updated dataframe, even after I return it.

My code is as follows

def data_cleaner(dataFrame):
    #removing random rows
    removed = dataFrame.drop(columns=['Unnamed: 1','Unnamed: 2','Unnamed: 4','Unnamed: 5','Unnamed: 7','Unnamed: 9','Unnamed: 11','Unnamed: 13','Unnamed: 15','Unnamed: 17','Unnamed: 19'])
    #call next method
    col_combiner(removed)

def col_combiner(dataFrame):
    
    #Grabbing first and second row of table to combine
    first_row = dataFrame.iloc[0]
    second_row = dataFrame.iloc[1]
    #List to combine columns
    newColNames = []
    #Run through each row and combine them into one name
    for i,j in zip(first_row,second_row):
        #Check to see if they are not strings, if they are not convert it
        if not isinstance(i,str):
            i = str(i)
        if not isinstance(j,str):
            j = str(j)
        newString = ''
        #Check for double NAN case and change it to Expenses
        if i == 'nan' and j == 'nan':
            i = 'Expenses'
            newString = newString   i
        #Check for leading NAN and remove it
        elif i == 'nan':
            newString = newString   j
        else:            
            newString = newString   i   ' '   j
            
    
        newColNames.append(newString)
    
    #Now update the dataframes column names
    dataFrame.columns = newColNames
    
    #Remove the name rows since they are now the column names
    dataFrame = dataFrame.iloc[2:,:]
    
    #Going to clean the values in the DF
    clean_numbers(dataFrame)


def clean_numbers(dataFrame):
    #Fill NAN values with 0
    noNan = dataFrame.fillna(0)
    
    #Pull each column, clean the values, then put it back
    for i in range(noNan.shape[1]):
        colList = noNan.iloc[:,i].tolist()
        #calling to clean the column so that it is all ints
        col_checker(colList)
        noNan.iloc[:,i] = colList
    
    
    return noNan

def col_checker(col):
    #Going through, checking and cleaning
    for i in range(len(col)):
        #print(type(colList[i]))
        if isinstance(col[i],str):
            col[i] = col[i].replace(',','')
            if col[i].isdigit():
                #print('not here')
                col[i] = int(col[i]) 
            #If it is not a number then make it 0
            else:
                col[i] = 0

Then when I run this:

doesThisWork = data_cleaner(cleaner)
type(doesThisWork)

I get NoneType. I might be doing this the long way as I am new to this, so any advice is much appreciated!

CodePudding user response:

The reason you are getting NoneType is because your function does not have a return statement, meaning that when finishing executing it will automatically returns None. And it is the return value of a function that is assigned to a variable var in a statement like this:

var = fun(x)

Now, a different thing entirely is whether or not your dataframe cleaner will be changed by the function data_cleaner, which can happen because dataframes are mutable objects in Python.

In other words, your function can read your dataframe and change it, so after the function call cleaner is different than before. At the same time, your function can return a value (which it doesn't) and this value will be assigned to doesThisWork.

Usually, you should prefer that your function does only one thing, so expect that the function changes its argument and return a value is usually bad practice.

  • Related