How to apply a recursive digital filter in a pandas dataframe?-CodePudding

I have a dataframe like:

days1 = pd.date_range('2020-01-01 01:00:00','2020-01-01 01:19:00',freq='60s')

DF = pd.DataFrame({'Time': days1,
                    'TimeSeries1': [10, 10, 10, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20],
                    'TimeSeries2': [11, 12, 13, 12, 11, 14, 15, 16, 21, 20, 20, 23, 15, 15, 15, 15, 15, 15, 15, 15]})

And I would like to get the following:

For each of the TimeSeries columns (TimeSeries1 and TimeSeries2), I would like to create a correspondent "_Filtered" column, being: TimeSeries1_Filtered[i] = (1-A)* TimeSeries1_Filtered[i-1] A*TimeSeries1[i]

being "A" a filter factor between 0 and 1.

For each column I need to use a different "A" factor. For example: A1=0.5 for TimeSeries1 and A2=0.8 for TimeSeries1.
I have more than 100 "TimeSeriesN" columns, so it would be good the pass the "A#" parameters in form of a tuple or maybe a list.

Example with A1=0.5

                      Time  TimeSeries1  TimeSeries1_Filtered
0  2020-01-01 01:00:00           10           10
1  2020-01-01 01:01:00           10           10
2  2020-01-01 01:02:00           10           10
3  2020-01-01 01:03:00           20           15
4  2020-01-01 01:04:00           20           17.5
5  2020-01-01 01:05:00           20           18.75
6  2020-01-01 01:06:00           20           19.375
7  2020-01-01 01:07:00           20           19.6875
8  2020-01-01 01:08:00           20           19.84375
9  2020-01-01 01:09:00           20           19.92188
10 2020-01-01 01:10:00           20           19.96094
11 ...                           ...          ...

thanks!

EDIT: correction on the filter notation and equation. Thanks @not_speshal for the heads-up.

CodePudding user response：

For the nth data point, the recursive formula evaluates to:

filtered[n] = A*(x[n]   (1-A)*x[n-1]   (1-A)**2 * x[n-2]  ...)   (1-A)**n * x[0]

You can now create a custom function returning the above and apply it to your dataframe:

def ts_filter(srs, A):
    return srs.expanding().apply(lambda x: A*(x*((1-A)**np.arange(len(x))[::-1])).sum()   (1-A)**x.size*x.iat[0])

factors = {"TimeSeries1": 0.5, "TimeSeries2": 0.2}
filtered = df.filter(like="TimeSeries").apply(lambda x: ts_filter(x, A=factors[x.name]))

output = df.join(filtered, rsuffix="_filtered")

Output:

>>> output
                  Time  TimeSeries1  ...  TimeSeries1_filtered  TimeSeries2_filtered
0  2020-01-01 01:00:00           10  ...             10.000000             11.000000
1  2020-01-01 01:01:00           10  ...             10.000000             11.200000
2  2020-01-01 01:02:00           10  ...             10.000000             11.560000
3  2020-01-01 01:03:00           20  ...             15.000000             11.648000
4  2020-01-01 01:04:00           20  ...             17.500000             11.518400
5  2020-01-01 01:05:00           20  ...             18.750000             12.014720
6  2020-01-01 01:06:00           20  ...             19.375000             12.611776
7  2020-01-01 01:07:00           20  ...             19.687500             13.289421
8  2020-01-01 01:08:00           20  ...             19.843750             14.831537
9  2020-01-01 01:09:00           20  ...             19.921875             15.865229
10 2020-01-01 01:10:00           20  ...             19.960938             16.692183
11 2020-01-01 01:11:00           20  ...             19.980469             17.953747
12 2020-01-01 01:12:00           20  ...             19.990234             17.362997
13 2020-01-01 01:13:00           20  ...             19.995117             16.890398
14 2020-01-01 01:14:00           20  ...             19.997559             16.512318
15 2020-01-01 01:15:00           20  ...             19.998779             16.209855
16 2020-01-01 01:16:00           20  ...             19.999390             15.967884
17 2020-01-01 01:17:00           20  ...             19.999695             15.774307
18 2020-01-01 01:18:00           20  ...             19.999847             15.619446
19 2020-01-01 01:19:00           20  ...             19.999924             15.495556

CodePudding user response：

Why not use a time series filtering package such as scipy.signal?

This is how I would do filtering with scipy.signal.lfilter:

(Thanks @not_speshal for pointing out the mistake in the OP's difference equation)

from scipy.signal import lfilter

coeffs = {'TimeSeries1': 0.5, 'TimeSeries2': 0.8}
for label, a in coeffs.items():
    DF[f"{label}_Filtered"] = lfilter([a], [1, a-1], DF[label])

However, it looks as though you are assuming an initial condition based on each filter being at steady-state at time i=0. This solution produces the results you wanted:

from scipy.signal import lfilter, lfiltic

coeffs = {'TimeSeries1': 0.5, 'TimeSeries2': 0.8}
for label, a in coeffs.items():
    y_prev = DF[label].iloc[0]  # previous filtered value
    zi = lfiltic([a], [1, a-1], [y_prev])  # initial condition
    DF[f"{label}_Filtered"] = lfilter([a], [1, a-1], DF[label], zi=zi)[0]
print(DF)

Output:

                  Time  TimeSeries1  TimeSeries2  TimeSeries1_Filtered  TimeSeries2_Filtered
0  2020-01-01 01:00:00           10           11             10.000000             11.000000
1  2020-01-01 01:01:00           10           12             10.000000             11.800000
2  2020-01-01 01:02:00           10           13             10.000000             12.760000
3  2020-01-01 01:03:00           20           12             15.000000             12.152000
4  2020-01-01 01:04:00           20           11             17.500000             11.230400
5  2020-01-01 01:05:00           20           14             18.750000             13.446080
...