I have a dataframe that contains multiple rows such as the following examples, with two separate object (string) columns such as the following:
Date Time
18 Jun 2022 10:22
18 Jun 2022 10:03
15 Jun 2022 21:34
I need to create a new column which has an integer describing the number of seconds since the epoch combining these two columns
I used the strptime in a lambda function to change the Date to a "YYYY-MM-DD" format, where the MM is now a zero led month number such as "06" and not a "Jun", so the new dataframe is like:
Date Time NewDate
18 Jun 2022 10:22 2022-06-18
18 Jun 2022 10:03 2022-06-18
15 Jun 2022 21:34 2022-06-15
where Date and Time are objects but NewDate is DateTime.
I am stuck trying to re-transform the NewDate datetime value to a string with strftime, then concatenate also the Time string and then change this to the final number of seconds since the epoch.
Of course maybe there's a better overall idea/strategy.
Thank for any help.
CodePudding user response:
you can create a datetime data type column like
df['datetime'] = pd.to_datetime(df['Date'] ' ' df['Time'])
df
Date Time datetime
0 18 Jun 2022 10:22 2022-06-18 10:22:00
1 18 Jun 2022 10:03 2022-06-18 10:03:00
2 15 Jun 2022 21:34 2022-06-15 21:34:00
pandas
datetime uses Unix time internally, but as nanoseconds since the epoch. You can therefore convert to seconds since the Unix epoch like
df['unix'] = df['datetime'].astype('int64')/1e9
df
Date Time datetime unix
0 18 Jun 2022 10:22 2022-06-18 10:22:00 1.655548e 09
1 18 Jun 2022 10:03 2022-06-18 10:03:00 1.655547e 09
2 15 Jun 2022 21:34 2022-06-15 21:34:00 1.655329e 09
Note that since we did not set a time zone, the 'datetime' will assumed to be UTC.
To get time in seconds relative to the oldest date, you could use
df['unix']-df['unix'].min()
# or if you don't actually need Unix time:
(df['datetime']-df['datetime'].min()).dt.total_seconds()