Home > Net >  Is there a better way to increment a timestamp column in a pandas dataframe?
Is there a better way to increment a timestamp column in a pandas dataframe?

Time:04-23

I'm working with a large pandas dataframe and want to add a timestamp column which correlates to the value of another column. For example, the current dataframe looks like this:

Server Hour
server1 0
server2 0
server1000 0
server1 1
server2 1

and so on, with the hours column at ranging from 0-167, as they correlate to the hourly timestamps of the following week.

I have the following code which establishes the weekly timestamps:

today = datetime.today()
start = (today - timedelta(days=today.weekday())).replace(hour=0, minute=0, second=0, microsecond=0)
end = (start   timedelta(days=6)).replace(hour=0, minute=0, second=0, microsecond=0)
print("end: "   str(end))

From there, I try to create the new "time" column arithmetically:

end=end.timestamp()
total_df['time']=end
total_df['time'] = total_df['time'].astype(float) #to convert to a float so I can multiply it with the time column

total_df['time']=total_df['time'] 3600*total_df['time'] #standardize timestamp to Sunday since the initial "end" was monday

Then I convert the time column back to a string and convert the unix timestamp to datetime

total_df['hour'] = total_df['hour'].astype(str)
total_df['hour']=pd.to_datetime(total_df['hour'],unit='s', utc='true')

Unfortunately, this method doesn't use my current timezone and standardizes to UTC, so the finalized hourly timestamps are 4 hours ahead of where they should be. I can account for this by subtracting 4 hours before conversion, but I feel like there must be a cleaner way to do this using datetime. My solution seems like such a roundabout way to say "add however many hours are in the hour column."

My expected output should look like this:

Server Hour Time
server1 0 2022-04-24 00:00:00-4:00
server2 0 2022-04-24 00:00:00-04:00
serverx 0 2022-04-24 00:00:00-04:00
server1000 0 2022-04-24 00:00:00-04:00
server1 1 2022-04-24 01:00:00-04:00
server2 1 2022-04-24 01:00:00-04:00
serverx 1 2022-04-24 01:00:00-04:00
server1000 1 2022-04-24 01:00:00-04:00
x x x
server1000 167 2022-04-30 23:00:00-04:00

with the "x" and "serverx" covering all of the server and hour values between 1 and 1000 and 1 and 167, respectively.

Alternatively, is there an easy way to convert between time zones? My current output column looks like it should, except it's in UTC, and I'd like it in EST.

CodePudding user response:

Do I understand correctly that you start out with a dataframe that has a hour column, for example:

df = pd.DataFrame({'hour': range(5)})
   hour
0     0
1     1
2     2
3     3
4     4

In this case you could try the following:

from datetime import date, datetime, timedelta

start = date.today()
df['time'] = (
    datetime(start.year, start.month, start.day)
      timedelta(days=6 - start.weekday())
      df['hour'].astype('timedelta64[h]')
).dt.tz_localize('EST')

Result:

   hour                      time
0     0 2022-04-24 00:00:00-05:00
1     1 2022-04-24 01:00:00-05:00
2     2 2022-04-24 02:00:00-05:00
3     3 2022-04-24 03:00:00-05:00
4     4 2022-04-24 04:00:00-05:00

Or use an explicit timezone offset:

from datetime import date, datetime, timedelta, timezone

start = date.today()
df['time'] = (
    datetime(
        start.year, start.month, start.day, tzinfo=timezone(timedelta(hours=-5))
    )
      timedelta(days=6 - start.weekday())
      df['hour'].astype('timedelta64[h]')
)
  • Related