I need to overwrite a spurious noise value every time it occurs in a Pandas data frame column. I need to overwrite it with a clean value from the previous row. If multiple adjacent noise values are encountered, all should be overwritten by the same recent good value.
The following code works but is too slow. Is there a better non-iterative Pandas'esque solution?
def cleanData(df) :
lastGoodValue = 0
for row in df.itertuples() :
if (df.at[row.Index, 'Barometric Altitude'] == 16383.997535000002) :
df.at[row.Index, 'Barometric Altitude'] = lastGoodValue
else:
lastGoodValue = df.at[row.Index, 'Barometric Altitude']
return df
CodePudding user response:
This might provide a solution to your itterating proces. For this I have used the aforementioned suggestion by using the ffill method:
import pandas as pd
noise_value = 16383.997535000002
# Sample dataframe
df = pd.DataFrame({'row': [1, 2, noise_value, noise_value, 4, 5, noise_value, 7, 8, 9]})
# Replace the bad value (noise_value) with the previous good value using the ffill method
df = df.replace(noise_value, method="ffill")
# Print the updated dataframe
print(df)
CodePudding user response:
To replace all noise values including any at the beginning, do a forward-fill AND then a back-fill.
df = df.replace(noiseValue, method="ffill").replace(noiseValue, method="bfill")
Even with replacement in both directions, this method is still 6 times faster than the iterative solution.