Is there a way to reassign values in a pandas dataframe using the .apply() method?
I have this code:
import pandas as pd
df = pd.DataFrame({'switch': ['ON', 'OFF', 'ON'],
'value': [10, 15, 20]})
print (df, '\n')
def myfunc(row):
if row['switch'] == 'ON':
row['value'] = 500
elif row['switch'] == 'OFF':
row['value'] = 0
df = df.apply(myfunc, axis=1)
print (df)
The code is not working. I am trying to achieve the following output after running the .apply() method:
switch value
0 ON 500
1 OFF 0
2 ON 500
Why is the "row['value'] = 500" assignment not working and how can I rewrite it to make it work?
CodePudding user response:
its not working because your function needs to return the value. also, you need to assign it back to the dataframe column for it to be present.
def f(row):
if row['switch'] == 'ON':
return 500
elif row['switch'] == 'OFF':
return 0
df['value'] = df.apply(f, axis=1)
df now has the values:
switch value
0 ON 500
1 OFF 0
2 ON 500
one thing to note here is whether switch
can have any other values other than ON
and OFF
.
- if those are the only permitted values, then you may replace the named function with a lambda expression.
- if other values are present, then they will currently be set to
None
since your if condition block does not handle them. You would need to set avalue
for every type ofswitch
or a default value to end up with a data frame withoutNone
invalue
CodePudding user response:
In addition to you not returning the value which is causing the error, I would suggest that you do not use apply()
instead use a vectorized version using np.where()
which is much faster.
import numpy as np
df['value'] = np.where(df['switch'] == "ON", 500, 0)
CodePudding user response:
you can write if..else
in lambda
like below:
>>> df['value'] = df['switch'].apply(lambda x : 500 if x == 'ON' else 0)
>>> df
switch value
0 ON 500
1 OFF 0
2 ON 500
If you want to write function
try this:
def myfunc(x):
if x == 'ON':
return 500
elif x == 'OFF':
return 0
df['value'] = df['switch'].apply(myfunc)
You can use np.select
and write multi condition like below:
import numpy as np
df['value'] = np.select([df['switch']=='ON',df['switch']=='OFF'], [500,0])