Home > Back-end >  Is there a way to create 10 millions row of random dataset in python?
Is there a way to create 10 millions row of random dataset in python?

Time:11-18

I would like to create random dataset consists of 10 million rows. Unfortunately, I could not find a way to create date column with specific range (example from 01.01.2021-31.12.2021).

I tried with oracle sql, but could not find a way to do that. There is way that I can do in excel, but excel can not handle 10 millions row of data. Therefore, I though Python can be the best way to do that, but I could not figure it out.

CodePudding user response:

Use pandas.date_range combined with numpy.random.choice:

df = pd.DataFrame(
    {
        'date': np.random.choice(
            pd.date_range('2021-01-01', '2021-12-31', freq='D'), size=10_000_000
        )
    }
)

Example:

              date
0       2021-04-05
1       2021-02-01
2       2021-09-22
3       2021-10-17
4       2021-04-28
...            ...
9999995 2021-07-24
9999996 2021-03-15
9999997 2021-07-28
9999998 2021-11-01
9999999 2021-03-20

[10000000 rows x 1 columns]

CodePudding user response:

Most python IDE's will come with a random module which you need because no random function is built in with python.

To get 10000000 rows of data a loop like the one below will probably work.

#Imports the random module
import random

#Creates a loop that will run 10 million times
for i in range(0,10000000):
  
  #Prints a random number between one and ten on each new row
  print(random.randint(0,10)
  

It will take a while but will work if this is what you are after?

  • Related