Evaluate NaT in a lambda function for a pandas dataframe-CodePudding

I have the following dataset:

import pandas as pd
import time

the_df = pd.DataFrame(
    {
        "id": [21, 22, 23, 24, 15, 26, 27, 28, 29, 30],
        "amount": [10, 30, 12, 60, 0.5, 0.2, np.nan, 5, np.nan, 3],
        "date": [
            '2022-01-01','2022-08-01',None,'2022-12-01','2022-02-01',
            None,'2022-02-01','2022-01-01','2022-11-01','2022-01-01'
        ]
    }
)
the_df
    id  amount  date
0   21  10.0    2022-01-01
1   22  30.0    2022-08-01
2   23  12.0    None
3   24  60.0    2022-12-01
4   15  0.5     2022-02-01
5   26  0.2     None
6   27  NaN     2022-02-01
7   28  5.0     2022-01-01
8   29  NaN     2022-11-01
9   30  3.0     2022-01-01

And I want to convert the column date into a float. My approach was to convert the date column into a pandas datetime, and then convert column into a float using a lambda function. My problem was to evaluate a NaT to avoid the conversion in case is NaT. This is what I tried:

the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
            apply(lambda x: float(time.mktime(x.timetuple())) if x.notnull() else x)

But I get this error:

AttributeError: 'Timestamp' object has no attribute 'notnull'

I tried this second option:

the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
            apply(lambda x: float(time.mktime(x.timetuple())) if x is not None else x)

But I get this error:

ValueError: NaTType does not support timetuple

Please, could you point what I am doing wrong? Why if the value is not evaluated when is None, it seems to be evaluated as timetuple?

CodePudding user response：

After changing the column to datetime, you can use df.loc to get the values that is not NaT and then do the apply.

melt = ~pd.isna(the_df['date']),['date']

the_df.loc[melt] = the_df.loc[melt]['date'].apply(lambda x: float(time.mktime(x.timetuple())))

   id  amount          date
0  21    10.0  1640966400.0
1  22    30.0  1659283200.0
2  23    12.0           NaT
3  24    60.0  1669824000.0
4  15     0.5  1643644800.0
5  26     0.2           NaT
6  27     NaN  1643644800.0
7  28     5.0  1640966400.0
8  29     NaN  1667232000.0
9  30     3.0  1640966400.0

CodePudding user response：

The error is resolved if you use pd.notnull instead in your lambda, or if you check for the date value being NaT. Personally, I prefer the first option, as it works with all the different typed nulls.

the_df.date = the_df.date.apply(lambda x: float(time.mktime(x.timetuple())) if pd.notnull(x) else x)

Your 2nd attempt does not work because the values in column date (after conversion by pd.to_datetime) don't contain any None, and therefore the if x is not None is always Truedf

CodePudding user response：

You can test for the existence of an NaT value specifically:

the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
            apply(lambda x: float(time.mktime(x.timetuple())) if x is not pd.NaT else x)