I'm working with a large pandas dataframe and want to add a timestamp column which correlates to the value of another column. For example, the current dataframe looks like this:
Server | Hour |
---|---|
server1 | 0 |
server2 | 0 |
server1000 | 0 |
server1 | 1 |
server2 | 1 |
and so on, with the hours column at ranging from 0-167, as they correlate to the hourly timestamps of the following week.
I have the following code which establishes the weekly timestamps:
today = datetime.today()
start = (today - timedelta(days=today.weekday())).replace(hour=0, minute=0, second=0, microsecond=0)
end = (start timedelta(days=6)).replace(hour=0, minute=0, second=0, microsecond=0)
print("end: " str(end))
From there, I try to create the new "time" column arithmetically:
end=end.timestamp()
total_df['time']=end
total_df['time'] = total_df['time'].astype(float) #to convert to a float so I can multiply it with the time column
total_df['time']=total_df['time'] 3600*total_df['time'] #standardize timestamp to Sunday since the initial "end" was monday
Then I convert the time column back to a string and convert the unix timestamp to datetime
total_df['hour'] = total_df['hour'].astype(str)
total_df['hour']=pd.to_datetime(total_df['hour'],unit='s', utc='true')
Unfortunately, this method doesn't use my current timezone and standardizes to UTC, so the finalized hourly timestamps are 4 hours ahead of where they should be. I can account for this by subtracting 4 hours before conversion, but I feel like there must be a cleaner way to do this using datetime. My solution seems like such a roundabout way to say "add however many hours are in the hour column."
My expected output should look like this:
Server | Hour | Time |
---|---|---|
server1 | 0 | 2022-04-24 00:00:00-4:00 |
server2 | 0 | 2022-04-24 00:00:00-04:00 |
serverx | 0 | 2022-04-24 00:00:00-04:00 |
server1000 | 0 | 2022-04-24 00:00:00-04:00 |
server1 | 1 | 2022-04-24 01:00:00-04:00 |
server2 | 1 | 2022-04-24 01:00:00-04:00 |
serverx | 1 | 2022-04-24 01:00:00-04:00 |
server1000 | 1 | 2022-04-24 01:00:00-04:00 |
x | x | x |
server1000 | 167 | 2022-04-30 23:00:00-04:00 |
with the "x" and "serverx" covering all of the server and hour values between 1 and 1000 and 1 and 167, respectively.
Alternatively, is there an easy way to convert between time zones? My current output column looks like it should, except it's in UTC, and I'd like it in EST.
CodePudding user response:
Do I understand correctly that you start out with a dataframe that has a hour
column, for example:
df = pd.DataFrame({'hour': range(5)})
hour
0 0
1 1
2 2
3 3
4 4
In this case you could try the following:
from datetime import date, datetime, timedelta
start = date.today()
df['time'] = (
datetime(start.year, start.month, start.day)
timedelta(days=6 - start.weekday())
df['hour'].astype('timedelta64[h]')
).dt.tz_localize('EST')
Result:
hour time
0 0 2022-04-24 00:00:00-05:00
1 1 2022-04-24 01:00:00-05:00
2 2 2022-04-24 02:00:00-05:00
3 3 2022-04-24 03:00:00-05:00
4 4 2022-04-24 04:00:00-05:00
Or use an explicit timezone offset:
from datetime import date, datetime, timedelta, timezone
start = date.today()
df['time'] = (
datetime(
start.year, start.month, start.day, tzinfo=timezone(timedelta(hours=-5))
)
timedelta(days=6 - start.weekday())
df['hour'].astype('timedelta64[h]')
)