I'm a newbie in Python working on a project, and I have a dataset where I need to manipulate some of the numbers inside a column based on certain criteria defined in a few functions I made.
Given a column of floats inside a Pandas DataFrame such that the column has a list like this and defined functions that does this algorithm to the data:
df = {'location x': [107.0, 254.0, 52.0, 640.0, 882.0],
'location y': [252.0, 56.0, 250.0, 86.0, 318.0]}
def change_y(num):
if num > 470:
num = 470 - [(num) - 470]
return num
else:
pass
def change_x(num):
if num < 250:
num = (250 - num) 250
return num
elif num > 250:
num = 250 - (num - 250)
return num
else:
pass
Using:
for index in df.index:
heatmap_df['location y'][index].apply(change_y)
heatmap_df['location x'][index].apply(change_x)
Yields this error:
22 for index in df.index:
---> 23 df['location y'][index].apply(change_y)
24 df['location x'][index].apply(change_x)
AttributeError: 'numpy.float64' object has no attribute 'apply'
Looking for help on whether I am using .apply() wrong or if there is an alternative, thanks!
CodePudding user response:
To use .apply()
, you don't need to loop. Instead, you can do like this:
import pandas as pd
df = pd.DataFrame({
'location x': [107.0, 254.0, 52.0, 640.0, 882.0],
'location y': [252.0, 56.0, 250.0, 86.0, 318.0],
})
def change_y(num):
if num > 470:
num = 470 - (num - 470)
return num
else:
pass
def change_x(num):
if num < 250:
num = (250 - num) 250
return num
elif num > 250:
num = 250 - (num - 250)
return num
else:
pass
# Just to like this without for loop
df['location y'] = df['location y'].apply(change_y)
df['location x'] = df['location x'].apply(change_x)
print(df)
"""
location x location y
0 393.0 None
1 246.0 None
2 448.0 None
3 -140.0 None
4 -382.0 None
"""
I also changed the function change_y
, from num = 470 - [(num) - 470]
to num = 470 - (num-470)
to avoid potential errors.
CodePudding user response:
num = (250 - num) 250
and num = 250 - (num - 250)
is the same: num = 500 - num
works for all conditions.
You can just use vectorial code, apply
is not needed here and inefficient:
df['location x'] = df['location x'].rsub(500)
df.loc[df['location y']>470, 'location y'] = 940 - df['location y']
Output:
location x location y
0 393.0 252.0
1 246.0 56.0
2 448.0 250.0
3 -140.0 86.0
4 -382.0 318.0