I would like to create random dataset consists of 10 million rows. Unfortunately, I could not find a way to create date column with specific range (example from 01.01.2021-31.12.2021).
I tried with oracle sql, but could not find a way to do that. There is way that I can do in excel, but excel can not handle 10 millions row of data. Therefore, I though Python can be the best way to do that, but I could not figure it out.
CodePudding user response:
Use pandas.date_range
combined with numpy.random.choice
:
df = pd.DataFrame(
{
'date': np.random.choice(
pd.date_range('2021-01-01', '2021-12-31', freq='D'), size=10_000_000
)
}
)
Example:
date
0 2021-04-05
1 2021-02-01
2 2021-09-22
3 2021-10-17
4 2021-04-28
... ...
9999995 2021-07-24
9999996 2021-03-15
9999997 2021-07-28
9999998 2021-11-01
9999999 2021-03-20
[10000000 rows x 1 columns]
CodePudding user response:
Most python IDE's will come with a random module which you need because no random function is built in with python.
To get 10000000 rows of data a loop like the one below will probably work.
#Imports the random module
import random
#Creates a loop that will run 10 million times
for i in range(0,10000000):
#Prints a random number between one and ten on each new row
print(random.randint(0,10)
It will take a while but will work if this is what you are after?