I have dataframe that contains two columns of different classes diamond, gold and silver
.
class_pd = pd.DataFrame({'old_class':['gold', 'gold' , 'silver'],
'new_class':['diamond', 'silver', 'silver']})
I want to create a new column that shows wither the classes was Upgraded
or Downgraded
What I have tried
I wrote the below function to set the rules
def status_desc(class_pd, old_class, new_class):
if ((class_pd['old_class'] == 'gold') & (class_pd['new_class'] == 'diamond') or \
(class_pd['old_class'] == 'silver') & (class_pd['new_class'] == 'diamond') or \
(class_pd['old_class'] == 'silver') & (class_pd['new_class'] == 'gold')):
val = 'Upgrade'
elif ((class_pd['old_class'] == 'diamond') & (class_pd['new_class'] == 'gold') or \
(class_pd['old_class'] == 'diamond') & (class_pd['new_class'] == 'silver') or \
(class_pd['old_class'] == 'gold') & (class_pd['new_class'] == 'silver')):
val = 'Downgrade'
else:
val = 'NA'
Then I tried to apply the function to my dataframe using the below method
class_pd['class_desc'] = class_pd.apply(lambda x: status_desc(class_pd['old_class'], class_pd['new_class']), axis=1)
Error
I get this error
TypeError: status_desc() missing 1 required positional argument: new_class
Desired Output
class_pd = pd.DataFrame({'old_class':['gold', 'gold' , 'silver'],
'new_class':['diamond', 'silver', 'silver'],
'class_desc':['Upgrade','Downgrade', 'NA']})
CodePudding user response:
Another solution with pd.Categorical
, seems more elegant to me and more scalable:
categories = ['silver', 'gold', 'diamond']
class_pd = class_pd.apply(pd.Categorical, categories=categories, ordered=True)
class_pd['class_desc'] = 'NA'
class_pd.loc[class_pd.old_class > class_pd.new_class, 'class_desc'] = 'Downgrade'
class_pd.loc[class_pd.old_class < class_pd.new_class, 'class_desc'] = 'Upgrade'
We tell Pandas the inherent order, and can then use comparison operators.
Another way to do the last bit (after adding categories) suggested by @jezrael with numpy.select
:
import numpy as np
conditions = [
class_pd.old_class < class_pd.new_class,
class_pd.old_class > class_pd.new_class,
class_pd.old_class == class_pd.new_class,
]
labels = ["Upgrade", "Downgrade", "NA"]
class_pd["class_desc"] = np.select(conditions, labels)
CodePudding user response:
Your function status_desc
takes 3 arguments: class_pd, old_class, new_class
, but you are only passing 2 arguments class_pd['old_class'], class_pd['new_class']
. You need to pass the first argument for class_pd
as well. Also you're missing a few things:
- you need to return the values, not just assign them to
val
. So return "Upgrade", "Downgrade" and "NA". - In you
.apply
you need to pass thex
of the lambda function, if you passclass_pd
you pass the whole dataframe.x
contains a single row of the df, so you're looping through each row and the function looks at theold_class
andnew_class
columns for each row for the logic.
However a simpler step would be to only have 1 argument (the row) and define your function like this since you're not even using old_class, new_class
in your function:
def status_desc(class_pd):
if ((class_pd['old_class'] == 'gold') & (class_pd['new_class'] == 'diamond') or \
(class_pd['old_class'] == 'silver') & (class_pd['new_class'] == 'diamond') or \
(class_pd['old_class'] == 'silver') & (class_pd['new_class'] == 'gold')):
return 'Upgrade'
elif ((class_pd['old_class'] == 'diamond') & (class_pd['new_class'] == 'gold') or \
(class_pd['old_class'] == 'diamond') & (class_pd['new_class'] == 'silver') or \
(class_pd['old_class'] == 'gold') & (class_pd['new_class'] == 'silver')):
return 'Downgrade'
else:
return 'NA'
Then call it using:
class_pd['class_desc'] = class_pd.apply(lambda x: status_desc(x), axis=1)
Output using this code:
old_class new_class class_desc
0 gold diamond Upgrade
1 gold silver Downgrade
2 silver silver NA
CodePudding user response:
Here, the main logic is to provide rank
list which will replicate the importance by position and then compare position number new and old using if else.
Code:
rank = ['silver', 'gold', 'diamond'] #position silver = 0, gold=1 ,dia=2
class_pd['class_desc'] = class_pd.apply(lambda x: ('Upgrade' if (rank.index(x.old_class)) < (rank.index(x.new_class)) else 'Downgrade') if x.old_class != x.new_class else 'NA',axis=1)
class_pd
Output:
old_class new_class class_desc
0 gold diamond Upgrade
1 gold silver Downgrade
2 silver silver NA
CodePudding user response:
Firstly, you need to give one more parameter which is "class_pd" to your function. Also you need to give indexes of column names. For instance instead of class_pd['old_class'] == 'gold' you need to write class_pd['old_class'][0] == 'gold'.