Home > database >  How to apply custom function to manipulate floats in a column in Pandas Dataframe?
How to apply custom function to manipulate floats in a column in Pandas Dataframe?

Time:12-27

I'm a newbie in Python working on a project, and I have a dataset where I need to manipulate some of the numbers inside a column based on certain criteria defined in a few functions I made.

Given a column of floats inside a Pandas DataFrame such that the column has a list like this and defined functions that does this algorithm to the data:

df = {'location x': [107.0, 254.0, 52.0, 640.0, 882.0],
        'location y': [252.0, 56.0, 250.0, 86.0, 318.0]}

def change_y(num):
    if num > 470:
        num = 470 - [(num) - 470]
        return num
    else:
        pass

def change_x(num):
    if num < 250:
        num = (250 - num)   250
        return num
    elif num > 250:
        num = 250 - (num - 250)
        return num
    else:
        pass

Using:

for index in df.index:
    heatmap_df['location y'][index].apply(change_y)
    heatmap_df['location x'][index].apply(change_x)

Yields this error:

     22 for index in df.index:
---> 23     df['location y'][index].apply(change_y)
     24     df['location x'][index].apply(change_x)

AttributeError: 'numpy.float64' object has no attribute 'apply'

Looking for help on whether I am using .apply() wrong or if there is an alternative, thanks!

CodePudding user response:

To use .apply(), you don't need to loop. Instead, you can do like this:

import pandas as pd

df = pd.DataFrame({
    'location x': [107.0, 254.0, 52.0, 640.0, 882.0],
    'location y': [252.0, 56.0, 250.0, 86.0, 318.0],
})


def change_y(num):
    if num > 470:
        num = 470 - (num - 470)
        return num
    else:
        pass


def change_x(num):
    if num < 250:
        num = (250 - num)   250
        return num
    elif num > 250:
        num = 250 - (num - 250)
        return num
    else:
        pass

# Just to like this without for loop
df['location y'] = df['location y'].apply(change_y)
df['location x'] = df['location x'].apply(change_x)

print(df)
"""
   location x location y
0       393.0       None
1       246.0       None
2       448.0       None
3      -140.0       None
4      -382.0       None
"""

I also changed the function change_y, from num = 470 - [(num) - 470] to num = 470 - (num-470) to avoid potential errors.

CodePudding user response:

num = (250 - num) 250 and num = 250 - (num - 250) is the same: num = 500 - num works for all conditions.

You can just use vectorial code, apply is not needed here and inefficient:

df['location x'] = df['location x'].rsub(500)
df.loc[df['location y']>470, 'location y'] = 940 - df['location y']

Output:

   location x  location y
0       393.0       252.0
1       246.0        56.0
2       448.0       250.0
3      -140.0        86.0
4      -382.0       318.0
  • Related