I've got a dataset with an insanely high sampling rate, and would like to remove excess data where the columnar value changes less than a predefined value down through the dataset. However, some intermediary points need to be kept in order to not loose all data.
e.g.
t V
0 1.0 1.0
1 2.0 1.2
2 3.0 2.0
3 3.3 3.0
4 3.4 4.0
5 3.7 4.2
6 3.8 4.6
7 4.4 5.4
8 5.1 6.0
9 6.0 7.0
10 7.0 10.0
Now I want to delete all the rows where the change in V from one row to another is less than dV, AND the change in t is below dt, but still keep datapoints such that there is data at roughly every interval dV or dt.
Lets say for dV = 1 and dt = 1, the wanted output would be:
t V
0 1.0 1.0
1 2.0 1.2
2 3.0 2.0
3 3.3 3.0
4 3.4 4.0
7 4.4 5.4
9 6.0 7.0
10 7.0 10.0
Meaning row 5, 6 and 8 was deleted since it was within the changevalue, but row 7 remains since it has a changevalue above dt and dV in both directions.
The easy solution is iterating over the rows in the dataframe, but a faster (and more proper) solution is wanted.
EDIT: The question was edited to reflect the point that intermediary points must be kept in order to not delete too much.
CodePudding user response:
Use DataFrame.diff
with boolean indexing
:
dV = 1
dt = 1
df = df[~(df['t'].diff().lt(dt) & df['V'].diff().lt(dV))]
print (df)
t V
0 1.0 1.0
1 2.0 1.2
2 3.0 2.0
3 3.3 3.0
4 3.4 4.0
7 5.0 6.0
8 5.1 8.0
9 6.0 9.0
10 7.0 10.0
Or:
dV = 1
dt = 1
df1 = df.diff()
df = df[df1['t'].fillna(dt).ge(dt) | df1['V'].fillna(dV).ge(dV)]
print (df)
t V
0 1.0 1.0
1 2.0 1.2
2 3.0 2.0
3 3.3 3.0
4 3.4 4.0
7 5.0 6.0
8 5.1 8.0
9 6.0 9.0
10 7.0 10.0
CodePudding user response:
you might want to use shift() method:
diff_df = df - df.shift()
and then filter rows with loc:
diff_df = diff_df.loc[diff_df['V'] > 1.0 & diff_df['t'] > 1.0]
CodePudding user response:
You can use loc
for boolean indexing and do the comparison between the values between rows within each column using shift()
:
# Thresholds
dv = 1
dt = 1
# Filter out
print(df.loc[~((df.V.sub(df.V.shift()) < 1) & (df.t.sub(df.t.shift()) < 1))])
t V
0 1.0 1.0
1 2.0 1.2
2 3.0 2.0
3 3.3 3.0
4 3.4 4.0
7 5.0 6.0
8 5.1 8.0
9 6.0 9.0
10 7.0 10.0