Home > other >  Random timestamps and larger 'weight' to a specific range
Random timestamps and larger 'weight' to a specific range

Time:11-05

I have a pandas function that takes a day column and assigns/concatenates a random time (hour:minute:seconds) to each day.

pd.to_datetime(d['day'])   pd.to_timedelta(np.random.randint(0,24*3600, size=len(d)), unit='s'))

Example output: 1/1/2021 19:00:22, 1/1/2021 3:21:34

This works well and generates random datetimes on a given day. What I want, however, is to have more random timestamps between two times; in my case between 9:00AM and 7:00PM. So anything outside that time range will ultimately have fewer randomized values.

CodePudding user response:

Use np.random.choice by creating individual probability for each second in a day.

# individual probability inside and outside the range 7-19
p_in = 0.8 / ((19-7)*3600)  
p_out = 0.2 / (24*3600 - (19-7)*3600)

# array of probabilities
p = np.full(24*3600, p_out)
p[7*3600:19*3600] = p_in

# seconds in a day
t = np.arange(0, 24*3600)
>>> df['day']   pd.to_timedelta(np.random.choice(t, len(df), p=p), unit='s')
0     2021-11-03 18:18:30
1     2021-11-03 22:25:47
2     2021-11-03 15:04:09
3     2021-11-04 01:08:31
4     2021-11-03 17:51:53
              ...        
117   2021-11-04 15:05:33
118   2021-11-04 07:12:58
119   2021-11-04 09:09:38
120   2021-11-05 00:17:58
121   2021-11-04 23:53:20
Length: 122, dtype: datetime64[ns]

You can adjust the probability (0.8 / 0.2) according to your needs.

>>> np.sum(p)
0.9999999999999999

>>> np.isclose(np.sum(p), 1)
True 

Demo

df = pd.DataFrame({'day': pd.date_range("2021-01-01", "2021-01-31", freq='D')})

df['day']   pd.to_timedelta(np.random.choice(t, len(df), p=p), unit='s')

# Output:
0    2021-01-01 08:03:53
1    2021-01-02 02:48:28  # outside
2    2021-01-03 06:37:24
3    2021-01-04 18:15:01
4    2021-01-05 10:36:53
5    2021-01-06 06:41:23  # outside
6    2021-01-07 10:33:09
7    2021-01-08 13:23:46
8    2021-01-09 08:47:57
9    2021-01-10 07:37:35
10   2021-01-11 04:57:13  # outside
11   2021-01-12 17:01:39
12   2021-01-13 13:58:16
13   2021-01-14 08:57:05
14   2021-01-15 08:04:10
15   2021-01-16 20:07:45  # outside
16   2021-01-17 02:42:26
17   2021-01-18 17:10:00
18   2021-01-19 08:22:52
19   2021-01-20 18:07:02
20   2021-01-21 14:40:18
21   2021-01-22 08:39:55
22   2021-01-23 18:54:33
23   2021-01-24 06:39:38  # outside
24   2021-01-25 14:41:48
25   2021-01-26 07:54:33
26   2021-01-27 05:34:36  # outside
27   2021-01-28 18:55:51
28   2021-01-29 09:37:26
29   2021-01-30 22:07:28  # outside
30   2021-01-31 10:39:51
dtype: datetime64[ns]

Here, 7 values are outside the range and 24 inside, so the distribution is 0.226 and 0.774 (= 1.0). It's almost equal to the initial probability of 0.2 / 0.8.

  • Related