I'm trying to create a fake column of dates within a Pandas dataframe with the following format: year.month (ex: 2022.01 for January 2022). I have ~200,000 rows in the dataframe and I would basically like to randomly assign them a date, ranging from 2010.01 to 2020.12, how can I do this using Pandas? Ideally the dtype for this new column would be a float (I am trying to recreate a training example I found and this is how it has its date formatted).
CodePudding user response:
Combine pandas.date_range
and numpy.random.choice
:
import numpy as np
dates = (pd.date_range('2010-01', '2020-12', freq='M')
.strftime('%Y.%m').astype(float)
)
N = 1000
df = pd.DataFrame({'date': np.random.choice(dates, size=N)})
print(df)
NB. Using floats is a tricky choice as you cannot control the trailing zeros. 2010-Oct
could appear as 2010.1
.
Example:
date
0 2015.03
1 2014.01
2 2014.06
3 2011.10
4 2010.11
.. ...
995 2018.07
996 2019.01
997 2015.05
998 2017.09
999 2016.03
[1000 rows x 1 columns]