I have two columns called "Status" & "DELTA_TIME", I want to create a new column called "Past Minute" and I wish to add values based on the below conditions.
- If the status is Open & Checkin and DELTA_TIME is below 30, the "Past_Minute" value should be blank
- If the status is Open & Checkin and DELTA_TIME is between 30 & 60, the "Past_Minute" value should be "30 Mins Delay"
- If the status is Open & Checkin and DELTA_TIME is between 60 & 90, the "Past_Minute" value should be "60 Mins Delay"
- If the status is Open & Checkin and DELTA_TIME is above 90, the "Past_Minute" value should be "90 Mins Delay"
Here is my data frame
df = pd.DataFrame({'Status':['Closed','Open', 'Open','Cancel','Closed','Cancel', 'Checkin', 'Open', 'Open', 'Checkin', 'Open'],
'DELTA_TIME':[-3, -1, 0, 5, 55, 40, 70, 80, 90, 100, 44]})
Status DELTA_TIME
0 Closed -3
1 Open -1
2 Open 0
3 Cancel 5
4 Closed 55
5 Cancel 40
6 Checkin 70
7 Open 80
8 Open 90
9 Checkin 100
10 Open 44
and here is my expectation of data.
Status DELTA_TIME Past_Minute
0 Closed -3
1 Open -1
2 Open 0
3 Cancel 5
4 Closed 55
5 Cancel 40
6 Checkin 70 60 Mins Delay
7 Open 80 60 Mins Delay
8 Open 90 90 Mins Delay
9 Checkin 100 90 Mins Delay
10 Open 44 30 Mins Delay
1st Method:- I tried using loc to get the "Past_Minute" column values as below.
df.loc[(df['Status'] == 'Open') | (df['Status'] == 'Checkin') & (df['DELTA_TIME'] <30.00), "Past_Minute"] = ''
df.loc[(df['Status'] == 'Open') | (df['Status'] == 'Checkin') & (df['DELTA_TIME'] > 30.00) & (df['DELTA_TIME'] < 60.00), "Past_Minute"] = '30 Mins Delay'
df.loc[(df['Status'] == 'Open') | (df['Status'] == 'Checkin') & (df['DELTA_TIME'] >=60.00) & (df['DELTA_TIME'] < 90.00), "Past_Minute"] = '60 Mins Delay'
df.loc[(df['Status'] == 'Open') | (df['Status'] == 'Checkin') & (df['DELTA_TIME'] >=90.00), "Past_Minute"] = '90 Mins Delay'
print(df)
And here is the result.
Status DELTA_TIME Past_Minute
0 Closed -3 NaN
1 Open -1 90 Mins Delay
2 Open 0 90 Mins Delay
3 Cancel 5 NaN
4 Closed 55 NaN
5 Cancel 40 NaN
6 Checkin 70 60 Mins Delay
7 Open 80 90 Mins Delay
8 Open 90 90 Mins Delay
9 Checkin 100 90 Mins Delay
10 Open 44 90 Mins Delay
[Finished in 0.9s]
You see here only 60 Mins delay & 90 mins delay is coming. 30 mins delay & Blank is not at all coming for less than 30 and between 30 to 60. though I have less than 30 and a few negative numbers as well.
2nd Method:- I tried to use Lambda and it worked pretty well based on DELTA_TiME column, however, I am not able to add the Status column in the equation to filter it further.
df['Past_Minute'] = df['DELTA_TIME'].apply(lambda x : '30 Mins Delay' if x>30 and x<60 else '60 Mins Delay' if x>60 and x<90 else '90 Mins Delay' if x>=90 else "")
here is 2nd result.
Status DELTA_TIME Past_Minute
0 Closed -3
1 Open -1
2 Open 0
3 Cancel 5
4 Closed 55 30 Mins Delay
5 Cancel 40 30 Mins Delay
6 Checkin 70 60 Mins Delay
7 Open 80 60 Mins Delay
8 Open 90 90 Mins Delay
9 Checkin 100 90 Mins Delay
10 Open 44 30 Mins Delay
[Finished in 0.9s]
You see, here I am definitely getting precise values that I want, based on my requirement, except I am not able to fit Status in the equation. For example, lines 4 and 5 should not have the values in the "Past Minute" column, because the status is not "Open" or "checkin".
Can anyone help me in fixing the Lambda for me or guide me to the right way to meet the conditions?
CodePudding user response:
You are nearly there , but use apply on entire df and by passing axis=1 we mean we want to pass entire row as lambda parameter not just one column value
df['Past_Minute'] = df.apply(lambda row: ('30 Mins Delay' if row['DELTA_TIME']>30 and row['DELTA_TIME']<60 else '60 Mins Delay' if row['DELTA_TIME']>60 and row['DELTA_TIME']<90 else '90 Mins Delay' if row['DELTA_TIME']>=90 else "") if row['Status'] in ['Open','Checkin'] else "",axis = 1)
df
Out[16]:
Status DELTA_TIME Past_Minute
0 Closed -3
1 Open -1
2 Open 0
3 Cancel 5
4 Closed 55
5 Cancel 40
6 Checkin 70 60 Mins Delay
7 Open 80 60 Mins Delay
8 Open 90 90 Mins Delay
9 Checkin 100 90 Mins Delay
10 Open 44 30 Mins Delay
CodePudding user response:
You aren't limited to lambda
syntax for the apply
method. For more complex tasks such as the one in the question, you can create a full function and pass it to apply
.
Here's a mock DataFrame and an example use:
import pandas as pd
df = pd.util.testing.makeMixedDataFrame()
df.head()
A B C D
0 0.0 0.0 foo1 2009-01-01
1 1.0 1.0 foo2 2009-01-02
2 2.0 0.0 foo3 2009-01-05
3 3.0 1.0 foo4 2009-01-06
4 4.0 0.0 foo5 2009-01-07
Here's a partial function that checks several different conditions and returns a boolean value. You can modify the function however you want.
def apply_func(x):
if x["A"] <= 2.0 and x["B"] <= 2.0:
return True
elif x["C"] == "foo4":
return True
else:
return False
df["new_col"] = df.apply(apply_func, axis=1)
df.head()
A B C D new_col
0 0.0 0.0 foo1 2009-01-01 True
1 1.0 1.0 foo2 2009-01-02 True
2 2.0 0.0 foo3 2009-01-05 True
3 3.0 1.0 foo4 2009-01-06 True
4 4.0 0.0 foo5 2009-01-07 False