Home > Net >  Python || Pandas || How to put multiple conditions in Lambda
Python || Pandas || How to put multiple conditions in Lambda

Time:01-21

I have two columns called "Status" & "DELTA_TIME", I want to create a new column called "Past Minute" and I wish to add values based on the below conditions.

  • If the status is Open & Checkin and DELTA_TIME is below 30, the "Past_Minute" value should be blank
  • If the status is Open & Checkin and DELTA_TIME is between 30 & 60, the "Past_Minute" value should be "30 Mins Delay"
  • If the status is Open & Checkin and DELTA_TIME is between 60 & 90, the "Past_Minute" value should be "60 Mins Delay"
  • If the status is Open & Checkin and DELTA_TIME is above 90, the "Past_Minute" value should be "90 Mins Delay"

Here is my data frame

df = pd.DataFrame({'Status':['Closed','Open', 'Open','Cancel','Closed','Cancel', 'Checkin', 'Open', 'Open', 'Checkin', 'Open'], 
                   'DELTA_TIME':[-3, -1, 0, 5, 55, 40, 70, 80, 90, 100, 44]})
     Status  DELTA_TIME
0    Closed          -3
1      Open          -1
2      Open           0
3    Cancel           5
4    Closed          55
5    Cancel          40
6   Checkin          70
7      Open          80
8      Open          90
9   Checkin         100
10     Open          44

and here is my expectation of data.

     Status  DELTA_TIME    Past_Minute
0    Closed          -3               
1      Open          -1               
2      Open           0               
3    Cancel           5               
4    Closed          55  
5    Cancel          40  
6   Checkin          70  60 Mins Delay
7      Open          80  60 Mins Delay
8      Open          90  90 Mins Delay
9   Checkin         100  90 Mins Delay
10     Open          44  30 Mins Delay

1st Method:- I tried using loc to get the "Past_Minute" column values as below.

df.loc[(df['Status'] == 'Open') | (df['Status'] == 'Checkin') & (df['DELTA_TIME'] <30.00), "Past_Minute"] = ''
df.loc[(df['Status'] == 'Open') | (df['Status'] == 'Checkin') & (df['DELTA_TIME'] > 30.00) & (df['DELTA_TIME'] < 60.00), "Past_Minute"] = '30 Mins Delay'
df.loc[(df['Status'] == 'Open') | (df['Status'] == 'Checkin') & (df['DELTA_TIME'] >=60.00) & (df['DELTA_TIME'] < 90.00), "Past_Minute"] = '60 Mins Delay'
df.loc[(df['Status'] == 'Open') | (df['Status'] == 'Checkin') & (df['DELTA_TIME'] >=90.00), "Past_Minute"] = '90 Mins Delay'

print(df)

And here is the result.

     Status  DELTA_TIME    Past_Minute
0    Closed          -3            NaN
1      Open          -1  90 Mins Delay
2      Open           0  90 Mins Delay
3    Cancel           5            NaN
4    Closed          55            NaN
5    Cancel          40            NaN
6   Checkin          70  60 Mins Delay
7      Open          80  90 Mins Delay
8      Open          90  90 Mins Delay
9   Checkin         100  90 Mins Delay
10     Open          44  90 Mins Delay
[Finished in 0.9s]

You see here only 60 Mins delay & 90 mins delay is coming. 30 mins delay & Blank is not at all coming for less than 30 and between 30 to 60. though I have less than 30 and a few negative numbers as well.

2nd Method:- I tried to use Lambda and it worked pretty well based on DELTA_TiME column, however, I am not able to add the Status column in the equation to filter it further.

df['Past_Minute'] = df['DELTA_TIME'].apply(lambda x : '30 Mins Delay' if x>30 and x<60 else '60 Mins Delay' if x>60 and x<90 else '90 Mins Delay' if x>=90 else "")

here is 2nd result.

    Status  DELTA_TIME    Past_Minute
0    Closed          -3               
1      Open          -1               
2      Open           0               
3    Cancel           5               
4    Closed          55  30 Mins Delay
5    Cancel          40  30 Mins Delay
6   Checkin          70  60 Mins Delay
7      Open          80  60 Mins Delay
8      Open          90  90 Mins Delay
9   Checkin         100  90 Mins Delay
10     Open          44  30 Mins Delay
[Finished in 0.9s]

You see, here I am definitely getting precise values that I want, based on my requirement, except I am not able to fit Status in the equation. For example, lines 4 and 5 should not have the values in the "Past Minute" column, because the status is not "Open" or "checkin".

Can anyone help me in fixing the Lambda for me or guide me to the right way to meet the conditions?

CodePudding user response:

You are nearly there , but use apply on entire df and by passing axis=1 we mean we want to pass entire row as lambda parameter not just one column value

df['Past_Minute'] = df.apply(lambda row:  ('30 Mins Delay' if row['DELTA_TIME']>30 and row['DELTA_TIME']<60 else '60 Mins Delay' if row['DELTA_TIME']>60 and row['DELTA_TIME']<90 else '90 Mins Delay' if row['DELTA_TIME']>=90 else "") if row['Status'] in ['Open','Checkin'] else "",axis = 1)
df
Out[16]: 
     Status  DELTA_TIME    Past_Minute
0    Closed          -3               
1      Open          -1               
2      Open           0               
3    Cancel           5               
4    Closed          55               
5    Cancel          40               
6   Checkin          70  60 Mins Delay
7      Open          80  60 Mins Delay
8      Open          90  90 Mins Delay
9   Checkin         100  90 Mins Delay
10     Open          44  30 Mins Delay

CodePudding user response:

You aren't limited to lambda syntax for the apply method. For more complex tasks such as the one in the question, you can create a full function and pass it to apply.

Here's a mock DataFrame and an example use:

import pandas as pd

df = pd.util.testing.makeMixedDataFrame()
df.head()

     A    B     C          D
0  0.0  0.0  foo1 2009-01-01
1  1.0  1.0  foo2 2009-01-02
2  2.0  0.0  foo3 2009-01-05
3  3.0  1.0  foo4 2009-01-06
4  4.0  0.0  foo5 2009-01-07

Here's a partial function that checks several different conditions and returns a boolean value. You can modify the function however you want.

def apply_func(x):
    if x["A"] <= 2.0 and x["B"] <= 2.0:
        return True
    elif x["C"] == "foo4":
        return True
    else:
        return False

df["new_col"] = df.apply(apply_func, axis=1)
df.head()

     A    B     C          D  new_col
0  0.0  0.0  foo1 2009-01-01     True
1  1.0  1.0  foo2 2009-01-02     True
2  2.0  0.0  foo3 2009-01-05     True
3  3.0  1.0  foo4 2009-01-06     True
4  4.0  0.0  foo5 2009-01-07    False
  • Related