This is kind of a mixture between these two questions:
Pandas is a Timestamp within a Period (because it adds a time period in pandas)
Generate a random date between two other dates (but I need multiple dates (at least 1 million which I specify with a variable LIMIT))
How can I generate random dates WITH random time between a given date period randomly for a specific given amount?
Performance is rather important for me, hence I chose to go with pandas, any performance boosts are appreciated even if that means using another library.
My approach so far would be the following:
tstamp = pd.to_datetime(['01/01/2010', '2020-12-31'])
# ???
But I don't know how to randomize between dates. I was thinking of using randint
for a random unix epoch time and then converting that, but it would slow it down A LOT.
CodePudding user response:
All I had to do was to add str(fake.date_time_between(start_date='-10y', end_date='now'))
into my Pandas DataFrame append logic. I'm not even sure that the str()
there is necessary.
P.S. you initialize it like this:
from faker import Faker
# initialize Faker
fake = Faker()
CodePudding user response:
You can try this, it is very fast:
start = np.datetime64('2017-01-01')
end = np.datetime64('2018-01-01')
limit = 1000000
delta = np.arange(start,end)
indices = np.random.choice(len(delta), limit)
delta[indices]