Home > OS >  How would I generate a random data series in Python with events?
How would I generate a random data series in Python with events?

Time:06-05

I'm trying to generate a random data series (or a time series) for anomaly detection, with events spanning a few consecutive data points. They could be values above/below a certain threshold, or anomaly types with different known probabilities.

e.g. in a case where 1 is normal and event types are within [2, 3, 4]: 11112221113333111111112211111

I looked through the np.random and random methods, but couldn't find any that generate these events. My current solution is picking random points, adding random durations to them to generate event start and end positions, labeling each event with a random event type, and joining back to the dataset, something like:

import numpy as np
num_events = np.random.randint(1, 10)
number_series = [1]*60
first_pos = 0 
event_starts = sorted([first_pos   i for i in np.random.randint(50, size = num_events)])
event_ends = [sum(i) for i in list(zip(event_starts, np.random.randint(8, size = num_events)))]
for c in list(zip(event_starts, event_ends)):
    rand_event_type  = np.random.choice(a = [2, 3, 4], p = [0.5, 0.3, 0.2])
    number_series[c[0]:c[1]] = [rand_event_type]*len(number_series[c[0]:c[1]])
print(number_series)

[1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 3, 3, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

But I'm wondering if there is a simpler way to just generate a series of numbers with events, based on a set of probabilities.

CodePudding user response:

It all depends on how you model your process (the underlying process you want to simulate). You can read more about some of the usual models on Wikipedia.

In the following, we use a very simple model (slightly different than yours): events each have a probability (like in your question) and a random duration that is independent of the event itself. 1 ("normal") is an event like any others (unlike your sample code). We could change that, but right now this is one of the simplest models you can think of.

def gen_events(n):
    events = np.random.choice(a=[1, 2, 3, 4], p=[0.6, 0.2, 0.12, 0.08], size=n)
    durations = np.random.randint(1, 8, size=n)
    return np.repeat(events, durations)

np.random.seed(0)  # repeatable example
number_series = gen_events(10)  # for example

>>> number_series
array([1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
       1, 2, 2, 1, 1, 1, 1, 1, 1, 3, 4, 4, 1, 1, 1, 1, 1])

Note, this is very fast:

%timeit gen_events(1_000_000)
# 44.9 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Another model (easier to parameterize, a bit more complex to implement) would be a Markov model. The simplest of them would be a Markov chain.

CodePudding user response:

A less verbose way would be to generate your list of events on the go.

Set, for example, a probability for an occurrence of an anomaly (say, 5%). Then,

events = []
for i in range(60):
  if random() <= 0.95:
    events.append(1)
  else:
    events.extend([choice(a = [2, 3, 4], p = [0.5, 0.3, 0.2])] * randint(8))
  • Related