I have the following dataset:
import pandas as pd
import time
the_df = pd.DataFrame(
{
"id": [21, 22, 23, 24, 15, 26, 27, 28, 29, 30],
"amount": [10, 30, 12, 60, 0.5, 0.2, np.nan, 5, np.nan, 3],
"date": [
'2022-01-01','2022-08-01',None,'2022-12-01','2022-02-01',
None,'2022-02-01','2022-01-01','2022-11-01','2022-01-01'
]
}
)
the_df
id amount date
0 21 10.0 2022-01-01
1 22 30.0 2022-08-01
2 23 12.0 None
3 24 60.0 2022-12-01
4 15 0.5 2022-02-01
5 26 0.2 None
6 27 NaN 2022-02-01
7 28 5.0 2022-01-01
8 29 NaN 2022-11-01
9 30 3.0 2022-01-01
And I want to convert the column date
into a float
. My approach was to convert the date
column into a pandas datetime
, and then convert column into a float
using a lambda function. My problem was to evaluate a NaT
to avoid the conversion in case is NaT
. This is what I tried:
the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
apply(lambda x: float(time.mktime(x.timetuple())) if x.notnull() else x)
But I get this error:
AttributeError: 'Timestamp' object has no attribute 'notnull'
I tried this second option:
the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
apply(lambda x: float(time.mktime(x.timetuple())) if x is not None else x)
But I get this error:
ValueError: NaTType does not support timetuple
Please, could you point what I am doing wrong? Why if the value is not evaluated when is None
, it seems to be evaluated as timetuple
?
CodePudding user response:
After changing the column to datetime, you can use df.loc to get the values that is not NaT
and then do the apply.
melt = ~pd.isna(the_df['date']),['date']
the_df.loc[melt] = the_df.loc[melt]['date'].apply(lambda x: float(time.mktime(x.timetuple())))
id amount date
0 21 10.0 1640966400.0
1 22 30.0 1659283200.0
2 23 12.0 NaT
3 24 60.0 1669824000.0
4 15 0.5 1643644800.0
5 26 0.2 NaT
6 27 NaN 1643644800.0
7 28 5.0 1640966400.0
8 29 NaN 1667232000.0
9 30 3.0 1640966400.0
CodePudding user response:
The error is resolved if you use pd.notnull
instead in your lambda, or if you check for the date value being NaT
. Personally, I prefer the first option, as it works with all the different typed nulls.
the_df.date = the_df.date.apply(lambda x: float(time.mktime(x.timetuple())) if pd.notnull(x) else x)
Your 2nd attempt does not work because the values in column date (after conversion by pd.to_datetime
) don't contain any None
, and therefore the if x is not None
is always True
df
CodePudding user response:
You can test for the existence of an NaT
value specifically:
the_df['date'] = pd.to_datetime(the_df['date'])
the_df['date'] = the_df['date'].\
apply(lambda x: float(time.mktime(x.timetuple())) if x is not pd.NaT else x)