I have a DataFrame containing approximately 7000 rows and 2 columns which looks like this:
Time Voltage
0 0.0 32.965541
1 0.5 32.914965
2 1.0 32.904850
3 1.5 32.864389
4 14.0 31.680907
5 24.0 31.023417
6 24.5 31.003186
7 25.0 30.982956
8 25.5 30.942495
9 26.0 30.952610
10 50.0 30.062469
11 50.5 30.022009
12 56.0 29.961317
13 56.5 29.941087
14 57.0 29.930971
15 57.5 29.910741
16 58.0 29.890511
17 73.0 21.211641
18 73.5 21.181296
19 74.0 21.201526
20 87.5 21.120604
21 88.0 21.080143
22 88.5 21.110489
I want to "compress" the dataframe to only time steps that correspond with a voltage difference of magnitude of at least one volt from one step to the next.
For example, starting at time 0.0
, the next voltage whose difference in magnitude is at least one volt is at time 14.0
. Then, from time 14.0
, the next voltage whose difference in magnitude is at least one volt is at time 50.0
.
CodePudding user response:
New answer
Okay so after some time, I think I've finally come to understand what you're asking. It seems that you want to essentially "compress" the data so that each chronological time step has a difference in voltage that is at least 1V in magnitude.
For example, starting with the voltage at time 0.0
, the next voltage whose difference is at least of magnitude 1V is the voltage at time 14.0
. Then, starting from the voltage at time 14.0
, the next voltage difference above the magnitude threshold is at time 50.0
. Then you start looking from time 50.0
, and so on.
This can be achieved using what's known as a two-pointer algorithm. You essentially track -- no surprise -- two pointers: one which is fixed at a certain index, and one that increments one step at a time from the first pointer. Then when some condition is met, the first pointer is updated to the second pointer's location, and the second pointer then starts incrementing again. Here's a basic implementation:
def compress(x, thresh=1):
i, j, idxs = 0, 1, [0]
while j < len(x):
if abs(x[i] - x[j]) >= thresh:
idxs.append(j)
i = j
j = 1
return idxs
Which, when passed the Voltage
column from the dataframe produces this result:
In [26]: df.iloc[compress(df.Voltage, 1), :]
Out[26]:
Time Voltage
0 0.0 32.965541
4 14.0 31.680907
10 50.0 30.062469
17 73.0 21.211641
Old answer
I'll leave this old answer up so that future readers may still benefit from it.
You can get the change from one row above with .diff()
:
In [7]: df["deltaVoltage"] = df["Voltage"].diff()
In [8]: df
Out[8]:
Time Voltage deltaVoltage
0 0.0 32.965541 NaN
1 0.5 32.914965 -0.050576
2 1.0 32.904850 -0.010115
3 1.5 32.864389 -0.040461
4 14.0 31.680907 -1.183482
5 24.0 31.023417 -0.657490
6 24.5 31.003186 -0.020230
7 25.0 30.982956 -0.020230
8 25.5 30.942495 -0.040461
9 26.0 30.952610 0.010115
10 50.0 30.062469 -0.890140
11 50.5 30.022009 -0.040461
12 56.0 29.961317 -0.060691
13 56.5 29.941087 -0.020230
14 57.0 29.930971 -0.010115
15 57.5 29.910741 -0.020230
16 58.0 29.890511 -0.020230
17 73.0 21.211641 -8.678869
18 73.5 21.181296 -0.030346
19 74.0 21.201526 0.020230
20 87.5 21.120604 -0.080922
21 88.0 21.080143 -0.040461
22 88.5 21.110489 0.030346
Then, you can select the rows where the absolute value in the change in voltage is >= 1
:
In [9]: df[df["deltaVoltage"].abs() >= 1]
Out[9]:
Time Voltage deltaVoltage
4 14.0 31.680907 -1.183482
17 73.0 21.211641 -8.678869
Or, if you don't actually want the change in voltage saved as a column:
In [10]: df[df["Voltage"].diff().abs() >= 1]
Out[10]:
Time Voltage
4 14.0 31.680907
17 73.0 21.211641