Home > Blockchain >  How to fill a columns based on the null values in another column in pandas
How to fill a columns based on the null values in another column in pandas

Time:05-25

I have a dataframe like the below:

Emp_code Leave_applied  Leave_approved
0         15-Jan-2021    15-Jan-2021
2         18-Jan-2021    15-Jan-2021
3         20-Jan-2021       np.nan
4         15-Jan-2021    18-Jan-2021

I need to add a new column as leave type based on the below conditions:

if leave_applied greater than leave_approved, leave_type=unplanned

if leave_applied lesser than leave_approved, leave_type=planned

if leave_applied == leave_approved, leave_type=planned

if leave_approved == np.nan then leave_type= missing data


Required output

Emp_code Leave_applied  Leave_approved Leave type 

0   15-Jan-2021    15-Jan-2021  
 Planned

2   18-Jan-2021    15-Jan-2021 unplanned
3   20-Jan-2021       np.nan    missing data
4   15-Jan-2021    18-Jan-2021 planned 

i tried doing


df[leave_type] = np.where(df['Leave_applied'] > df['Leave_approved'],unplanned,
(np.where(df['Leave_approved'] == np.nan, 'Missing_data', 'Planned))) 

The code runs but I couldn't find any values as missing data in my dataframe.

CodePudding user response:

You can try np.select. Idea is that comparing NaT with any date is False, so leave that as default

df['Leave_applied'] = pd.to_datetime(df['Leave_applied'], errors='coerce')
df['Leave_approved'] = pd.to_datetime(df['Leave_approved'], errors='coerce')

df['Leave type'] = np.select(
    [df['Leave_applied'] > df['Leave_approved'],
     df['Leave_applied'] <= df['Leave_approved'],
     ],
    ['unplanned',
     'planned',
     ],
    default='missing data'
)
print(df)

   Emp_code Leave_applied Leave_approved    Leave type
0         0    2021-01-15     2021-01-15       planned
1         2    2021-01-18     2021-01-15     unplanned
2         3    2021-01-20            NaT  missing data
3         4    2021-01-15     2021-01-18       planned

CodePudding user response:

First convert values to datetimes by to_datetime and for test missing values use Series.isna:

df['Leave_applied'] = pd.to_datetime(df['Leave_applied'])
df['Leave_approved'] = pd.to_datetime(df['Leave_approved'])

df['leave_type'] = np.where(df['Leave_applied'] > df['Leave_approved'],'unplanned',
                   (np.where(df['Leave_approved'].isna(), 'Missing_data', 'Planned'))) 
          
print (df)
   Emp_code Leave_applied Leave_approved    leave_type
0         0    2021-01-15     2021-01-15       Planned
1         2    2021-01-18     2021-01-15     unplanned
2         3    2021-01-20            NaT  Missing_data
3         4    2021-01-15     2021-01-18       Planned

Or use numpy.select:

df['leave_type'] = np.select([df['Leave_approved'].isna(),
                              df['Leave_applied'] > df['Leave_approved']],
                             ['Missing_data', 'unplanned'], 'Planned') 
          
  • Related