Montecarlo continuation of multicolumn pandas timeseries-CodePudding

I have a bunch of data points in a timeseries in a pandas dataframe. Each column is supposedly independent of each other. I want to create a montecarlo process to calculate expected values for each of the columns. For that, my expectation is that the underlying data follows a brownian motion pattern, so I'd need to generate a normal distribution over the differences between points in time space.

I transform my data like this:

diffs = (data.diff() / data.shift(1))

This is what I have at the moment:

data = diffs.describe()

This gives the following output:

           A           B           C
count   4986.000000 4963.000000 1861.000000
mean    0.000285    0.000109    0.000421
std 0.015759    0.015426    0.014676
...

I process it like this to generate more samples:

import numpy as np
desired_samples = 1000
random = np.random.default_rng().normal(loc=[data.loc[["mean"]].to_numpy()], scale=[data.loc[["std"]].to_numpy()], size=[len(data.columns), desired_samples])

However this gives me an error:

ValueError: shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (441, 1000) and arg 1 with shape (1, 1, 441).

What I'd want is just a matrix of random values whose columns have the same std and mean as the sample's columns. I.e. such as when I do random.describe(), I'd get something like:

          A           B           C
count   1000.0       1000.0     1000.0
mean    0.000285    0.000109    0.000421
std 0.015759    0.015426    0.014676
...

What'd be the correct way to generate those samples?

CodePudding user response：

You could use apply() to create a data frame of random normal values using the attributes of the associated columns.

Generate Test Data

nv = 50
d = {'A':np.random.normal(1,1,nv),'B':np.random.normal(2,2,nv),'C':np.random.normal(3,3,nv)}
df = pd.DataFrame(d)
print(df)

           A         B         C
0   0.276252 -2.833479  5.746740
1   1.562030  1.497242  2.557416
2   0.883105 -0.861824  3.106192
3   0.352372  0.014653  4.006219
4   1.475524  3.151062 -1.392998
5   2.011649 -2.289844  4.371251
6   3.230964  3.578058  0.610422
7   0.366506  3.391327  0.812932
8   1.669673 -1.021665  4.262500
9   1.835547  4.292063  6.983015
10  1.768208  4.029970  3.971751
...
45  0.501706  0.926860  7.008008
46  1.759266 -0.215047  4.560403
47  1.899167  0.690204 -0.538415
48  1.460267  1.506934  1.306303
49  1.641662  1.066182  0.049233

df.describe()

               A          B          C
count  50.000000  50.000000  50.000000
mean    0.962083   1.522234   2.992492
std     1.073733   1.848754   2.838976

Generate Random Values with same approx (calculated) Mean and STD

mat = df.apply(lambda x: np.random.normal(x.mean(),x.std(),100))
print(mat)
           A         B         C
0   0.234955  2.201961  1.910073
1   1.973203  3.528576  5.925673
2  -0.858201  2.234295  1.741338
3   2.245650  2.805498  0.135784
4   1.913691  2.134813  2.246989
..       ...       ...       ...
95  2.996207  2.248727  2.792658
96  0.663609  4.533541  1.518872
97  0.848259 -0.348086  2.271724
98  3.672370  1.706185 -0.862440
99  0.392051  0.832358 -0.354981

[100 rows x 3 columns]

mat.describe()
                A           B           C
count  100.000000  100.000000  100.000000
mean     0.877725    1.332039    2.673327
std      1.148153    1.749699    2.447532

If you want the matrix to be numpy

mat.to_numpy()
array([[ 0.78881292,  3.09428714, -1.22757096],
       [ 0.13044099, -1.02564025,  2.6566989 ],
       [ 0.06090083,  1.50629474,  3.61487469],
       [ 0.71418932,  1.88441111,  5.84979454],
       [ 2.34287411,  2.58478867, -4.04433653],
       [ 1.41846256,  0.36414635,  8.47482082],
       [ 0.46765842,  1.37188986,  3.28011085],
       [ 0.87433273,  3.45735286,  1.13351138],
       [ 1.59029413,  4.0227165 ,  3.58282534],
       [ 2.23663894,  2.75007385, -0.36242541],
       [ 1.80967311,  1.29206572,  1.73277577],
       [ 1.20787923,  2.75529187,  4.64721489],
       [ 2.33466341,  6.43830387,  4.31354348],
       [ 0.87379125,  3.00658046,  4.94270155],
       etc ...