I have a dataframe like the below:
Emp_code Leave_applied Leave_approved
0 15-Jan-2021 15-Jan-2021
2 18-Jan-2021 15-Jan-2021
3 20-Jan-2021 np.nan
4 15-Jan-2021 18-Jan-2021
I need to add a new column as leave type based on the below conditions:
if leave_applied greater than leave_approved, leave_type=unplanned
if leave_applied lesser than leave_approved, leave_type=planned
if leave_applied == leave_approved, leave_type=planned
if leave_approved == np.nan then leave_type= missing data
Required output
Emp_code Leave_applied Leave_approved Leave type
0 15-Jan-2021 15-Jan-2021
Planned
2 18-Jan-2021 15-Jan-2021 unplanned
3 20-Jan-2021 np.nan missing data
4 15-Jan-2021 18-Jan-2021 planned
i tried doing
df[leave_type] = np.where(df['Leave_applied'] > df['Leave_approved'],unplanned,
(np.where(df['Leave_approved'] == np.nan, 'Missing_data', 'Planned)))
The code runs but I couldn't find any values as missing data in my dataframe.
CodePudding user response:
You can try np.select
. Idea is that comparing NaT
with any date is False
, so leave that as default
df['Leave_applied'] = pd.to_datetime(df['Leave_applied'], errors='coerce')
df['Leave_approved'] = pd.to_datetime(df['Leave_approved'], errors='coerce')
df['Leave type'] = np.select(
[df['Leave_applied'] > df['Leave_approved'],
df['Leave_applied'] <= df['Leave_approved'],
],
['unplanned',
'planned',
],
default='missing data'
)
print(df)
Emp_code Leave_applied Leave_approved Leave type
0 0 2021-01-15 2021-01-15 planned
1 2 2021-01-18 2021-01-15 unplanned
2 3 2021-01-20 NaT missing data
3 4 2021-01-15 2021-01-18 planned
CodePudding user response:
First convert values to datetimes by to_datetime
and for test missing values use Series.isna
:
df['Leave_applied'] = pd.to_datetime(df['Leave_applied'])
df['Leave_approved'] = pd.to_datetime(df['Leave_approved'])
df['leave_type'] = np.where(df['Leave_applied'] > df['Leave_approved'],'unplanned',
(np.where(df['Leave_approved'].isna(), 'Missing_data', 'Planned')))
print (df)
Emp_code Leave_applied Leave_approved leave_type
0 0 2021-01-15 2021-01-15 Planned
1 2 2021-01-18 2021-01-15 unplanned
2 3 2021-01-20 NaT Missing_data
3 4 2021-01-15 2021-01-18 Planned
Or use numpy.select
:
df['leave_type'] = np.select([df['Leave_approved'].isna(),
df['Leave_applied'] > df['Leave_approved']],
['Missing_data', 'unplanned'], 'Planned')