Home > Net >  Montecarlo continuation of multicolumn pandas timeseries
Montecarlo continuation of multicolumn pandas timeseries

Time:07-09

I have a bunch of data points in a timeseries in a pandas dataframe. Each column is supposedly independent of each other. I want to create a montecarlo process to calculate expected values for each of the columns. For that, my expectation is that the underlying data follows a brownian motion pattern, so I'd need to generate a normal distribution over the differences between points in time space.

I transform my data like this:

diffs = (data.diff() / data.shift(1))

This is what I have at the moment:

data = diffs.describe()

This gives the following output:

           A           B           C
count   4986.000000 4963.000000 1861.000000
mean    0.000285    0.000109    0.000421
std 0.015759    0.015426    0.014676
...

I process it like this to generate more samples:

import numpy as np
desired_samples = 1000
random = np.random.default_rng().normal(loc=[data.loc[["mean"]].to_numpy()], scale=[data.loc[["std"]].to_numpy()], size=[len(data.columns), desired_samples])

However this gives me an error:

ValueError: shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (441, 1000) and arg 1 with shape (1, 1, 441).

What I'd want is just a matrix of random values whose columns have the same std and mean as the sample's columns. I.e. such as when I do random.describe(), I'd get something like:

          A           B           C
count   1000.0       1000.0     1000.0
mean    0.000285    0.000109    0.000421
std 0.015759    0.015426    0.014676
...

What'd be the correct way to generate those samples?

CodePudding user response:

You could use apply() to create a data frame of random normal values using the attributes of the associated columns.

Generate Test Data

nv = 50
d = {'A':np.random.normal(1,1,nv),'B':np.random.normal(2,2,nv),'C':np.random.normal(3,3,nv)}
df = pd.DataFrame(d)
print(df)

           A         B         C
0   0.276252 -2.833479  5.746740
1   1.562030  1.497242  2.557416
2   0.883105 -0.861824  3.106192
3   0.352372  0.014653  4.006219
4   1.475524  3.151062 -1.392998
5   2.011649 -2.289844  4.371251
6   3.230964  3.578058  0.610422
7   0.366506  3.391327  0.812932
8   1.669673 -1.021665  4.262500
9   1.835547  4.292063  6.983015
10  1.768208  4.029970  3.971751
...
45  0.501706  0.926860  7.008008
46  1.759266 -0.215047  4.560403
47  1.899167  0.690204 -0.538415
48  1.460267  1.506934  1.306303
49  1.641662  1.066182  0.049233

df.describe()

               A          B          C
count  50.000000  50.000000  50.000000
mean    0.962083   1.522234   2.992492
std     1.073733   1.848754   2.838976

Generate Random Values with same approx (calculated) Mean and STD

mat = df.apply(lambda x: np.random.normal(x.mean(),x.std(),100))
print(mat)
           A         B         C
0   0.234955  2.201961  1.910073
1   1.973203  3.528576  5.925673
2  -0.858201  2.234295  1.741338
3   2.245650  2.805498  0.135784
4   1.913691  2.134813  2.246989
..       ...       ...       ...
95  2.996207  2.248727  2.792658
96  0.663609  4.533541  1.518872
97  0.848259 -0.348086  2.271724
98  3.672370  1.706185 -0.862440
99  0.392051  0.832358 -0.354981

[100 rows x 3 columns]

mat.describe()
                A           B           C
count  100.000000  100.000000  100.000000
mean     0.877725    1.332039    2.673327
std      1.148153    1.749699    2.447532

If you want the matrix to be numpy

mat.to_numpy()
array([[ 0.78881292,  3.09428714, -1.22757096],
       [ 0.13044099, -1.02564025,  2.6566989 ],
       [ 0.06090083,  1.50629474,  3.61487469],
       [ 0.71418932,  1.88441111,  5.84979454],
       [ 2.34287411,  2.58478867, -4.04433653],
       [ 1.41846256,  0.36414635,  8.47482082],
       [ 0.46765842,  1.37188986,  3.28011085],
       [ 0.87433273,  3.45735286,  1.13351138],
       [ 1.59029413,  4.0227165 ,  3.58282534],
       [ 2.23663894,  2.75007385, -0.36242541],
       [ 1.80967311,  1.29206572,  1.73277577],
       [ 1.20787923,  2.75529187,  4.64721489],
       [ 2.33466341,  6.43830387,  4.31354348],
       [ 0.87379125,  3.00658046,  4.94270155],
       etc ...
  • Related