Home > Software design >  Add random number of microseconds to each index in a dataframe's datetime index
Add random number of microseconds to each index in a dataframe's datetime index

Time:10-05

I have a dataframe that uses a DateTime index. Due to the nature of the data, there are duplicate rows in the raw data that produce duplicate indices. I would like to add a random number of microseconds to each index to make them unique.

So, my index is currently:

2021-06-01 08:58:47 00:00

and I want it to be:

2021-06-01 08:58:47.011356 00:00

I have a column named TimeUTC which I used to create the index. I tried adding a random number of microseconds to it:

    df.index = df.index   datetime.timedelta(microseconds= random.randrange(0,1000000,1))

but that added the same number of microseconds to all rows. I had the same result when trying to use apply lambda.

  1. How can I add a random number of microseconds to each row's TimeUTC?
  2. Can I do this directly to the index instead of updating TimeUTC and then reindexing?

Thank you.

CodePudding user response:

you need to use to_timedelta with unit='us' for microseconds, and an array of random number of the same length than the dataframe

# dummy data
df = pd.DataFrame({
    'a':range(10), 
    'TimeUTC':pd.to_datetime(['2021-10-01 00:00:00']*3 ['2021-10-01 00:00:01']*5
                              ['2021-10-01 00:00:02']*2, utc=True)}
)

# set seed for reproducibility
np.random.seed(10)

# set the index from TimeUTC and add microseconds with to_timedelta
df.index = df['TimeUTC'] pd.to_timedelta(np.random.randint(1,1000000,len(df)),unit='us')

print(df)
                                  a                   TimeUTC
2021-10-01 00:00:00.345354 00:00  0 2021-10-01 00:00:00 00:00
2021-10-01 00:00:00.760958 00:00  1 2021-10-01 00:00:00 00:00
2021-10-01 00:00:00.881168 00:00  2 2021-10-01 00:00:00 00:00
2021-10-01 00:00:01.443713 00:00  3 2021-10-01 00:00:01 00:00
2021-10-01 00:00:01.617842 00:00  4 2021-10-01 00:00:01 00:00
2021-10-01 00:00:01.105596 00:00  5 2021-10-01 00:00:01 00:00 
2021-10-01 00:00:01.533661 00:00  6 2021-10-01 00:00:01 00:00
2021-10-01 00:00:01.927706 00:00  7 2021-10-01 00:00:01 00:00
2021-10-01 00:00:02.299742 00:00  8 2021-10-01 00:00:02 00:00
2021-10-01 00:00:02.804337 00:00  9 2021-10-01 00:00:02 00:00

Note that adding random number may not conserve the time order of your dataframe, see in this case for the value a=5, the datetime index is lower than the previous two rows

  • Related