I found a weird situation in my randomly generated timestamps. I have an application where I generate artificial log data and I would like to be able to define the time range. Therefore I wrote a function like this:
# imports
from datetime import datetime
import time
from random import choice
timestamps = []
timerange_in_days = 14 # how many days back from today should my timestamps cover?
entries = 10000 # how many timestamps?
for _ in range(entries):
last_midnight = (int(time.time() // 86400)) * 86400 # find date border
days = range(1, timerange_in_days 1) # set the range
timestamp = last_midnight - (choice(days) * choice(range(1, 25)) * 3600) # create the timestamp
timestamp = datetime.fromtimestamp(timestamp).isoformat(timespec='milliseconds') # format it
timestamps.append(timestamp)
I then wrote this to a file and plotted in R, as I couldn't get it quickly visualized in python. I plotted a histogram by day and by hour, the little bar for October 8 comes from the timezone not being adjusted, meaning it goes until 2 am of the next day.
with open(r'/path/to/file/dates.txt', 'w') as myfile:
for item in timestamps:
my
file.write("%s\n" % item)
# in R
path <- "path/to/file"
dates <- data.table::fread(file.path(path, "dates.txt")) # recognizes as POSIXct automatically
hist(dates$V1, "days")
hist(dates$V1, "hours")
But my question is, why are the timestamps more frequent around "now"? I want them to be equally spread across the days
CodePudding user response:
Rethink your logic. choice(days) * choice(range(1, 25))
means randomly picking a day, but then multiplying it by a random amount of hours between 1-24. This means your "days" are instead multiplied by the average of ~12 hours, so most of them are much closer to last_midnight.
A much better approach is
timestamp = last_midnight - (random() * timerange_in_days * 24 *3600) # create the timestamp
Since random() gives a float between 0 and 1, you get the entire range between the earliest and latest period.
Also, you don't need to calculate last_midnight
inside the loop, just do it once before entering the loop.