Home > Software design >  Evaluate NaT in a lambda function for a pandas dataframe
Evaluate NaT in a lambda function for a pandas dataframe

Time:10-03

I have the following dataset:

import pandas as pd
import time

the_df = pd.DataFrame(
    {
        "id": [21, 22, 23, 24, 15, 26, 27, 28, 29, 30],
        "amount": [10, 30, 12, 60, 0.5, 0.2, np.nan, 5, np.nan, 3],
        "date": [
            '2022-01-01','2022-08-01',None,'2022-12-01','2022-02-01',
            None,'2022-02-01','2022-01-01','2022-11-01','2022-01-01'
        ]
    }
)
the_df
    id  amount  date
0   21  10.0    2022-01-01
1   22  30.0    2022-08-01
2   23  12.0    None
3   24  60.0    2022-12-01
4   15  0.5     2022-02-01
5   26  0.2     None
6   27  NaN     2022-02-01
7   28  5.0     2022-01-01
8   29  NaN     2022-11-01
9   30  3.0     2022-01-01

And I want to convert the column date into a float. My approach was to convert the date column into a pandas datetime, and then convert column into a float using a lambda function. My problem was to evaluate a NaT to avoid the conversion in case is NaT. This is what I tried:

the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
            apply(lambda x: float(time.mktime(x.timetuple())) if x.notnull() else x)

But I get this error:

AttributeError: 'Timestamp' object has no attribute 'notnull'

I tried this second option:

the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
            apply(lambda x: float(time.mktime(x.timetuple())) if x is not None else x)

But I get this error:

ValueError: NaTType does not support timetuple

Please, could you point what I am doing wrong? Why if the value is not evaluated when is None, it seems to be evaluated as timetuple?

CodePudding user response:

After changing the column to datetime, you can use df.loc to get the values that is not NaT and then do the apply.

melt = ~pd.isna(the_df['date']),['date']

the_df.loc[melt] = the_df.loc[melt]['date'].apply(lambda x: float(time.mktime(x.timetuple())))

   id  amount          date
0  21    10.0  1640966400.0
1  22    30.0  1659283200.0
2  23    12.0           NaT
3  24    60.0  1669824000.0
4  15     0.5  1643644800.0
5  26     0.2           NaT
6  27     NaN  1643644800.0
7  28     5.0  1640966400.0
8  29     NaN  1667232000.0
9  30     3.0  1640966400.0

CodePudding user response:

The error is resolved if you use pd.notnull instead in your lambda, or if you check for the date value being NaT. Personally, I prefer the first option, as it works with all the different typed nulls.

the_df.date = the_df.date.apply(lambda x: float(time.mktime(x.timetuple())) if pd.notnull(x) else x)

Your 2nd attempt does not work because the values in column date (after conversion by pd.to_datetime) don't contain any None, and therefore the if x is not None is always Truedf

CodePudding user response:

You can test for the existence of an NaT value specifically:

the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
            apply(lambda x: float(time.mktime(x.timetuple())) if x is not pd.NaT else x)
  • Related