Home > Software engineering >  Multiple date conversions in Pandas
Multiple date conversions in Pandas

Time:06-29

I have a dataframe that contains multiple rows such as the following examples, with two separate object (string) columns such as the following:

Date         Time
18 Jun 2022  10:22
18 Jun 2022  10:03
15 Jun 2022  21:34

I need to create a new column which has an integer describing the number of seconds since the epoch combining these two columns

I used the strptime in a lambda function to change the Date to a "YYYY-MM-DD" format, where the MM is now a zero led month number such as "06" and not a "Jun", so the new dataframe is like:

Date         Time   NewDate
18 Jun 2022  10:22  2022-06-18
18 Jun 2022  10:03  2022-06-18
15 Jun 2022  21:34  2022-06-15

where Date and Time are objects but NewDate is DateTime.

I am stuck trying to re-transform the NewDate datetime value to a string with strftime, then concatenate also the Time string and then change this to the final number of seconds since the epoch.

Of course maybe there's a better overall idea/strategy.

Thank for any help.

CodePudding user response:

you can create a datetime data type column like

df['datetime'] = pd.to_datetime(df['Date']   ' '   df['Time'])

df 
          Date   Time            datetime
0  18 Jun 2022  10:22 2022-06-18 10:22:00
1  18 Jun 2022  10:03 2022-06-18 10:03:00
2  15 Jun 2022  21:34 2022-06-15 21:34:00

pandas datetime uses Unix time internally, but as nanoseconds since the epoch. You can therefore convert to seconds since the Unix epoch like

df['unix'] = df['datetime'].astype('int64')/1e9

df 
          Date   Time            datetime          unix
0  18 Jun 2022  10:22 2022-06-18 10:22:00  1.655548e 09
1  18 Jun 2022  10:03 2022-06-18 10:03:00  1.655547e 09
2  15 Jun 2022  21:34 2022-06-15 21:34:00  1.655329e 09

Note that since we did not set a time zone, the 'datetime' will assumed to be UTC.

To get time in seconds relative to the oldest date, you could use

df['unix']-df['unix'].min()

# or if you don't actually need Unix time:
(df['datetime']-df['datetime'].min()).dt.total_seconds()
  • Related