Good Morning, I have a Series like the following.
Time Temperature
2019-01-02 02:00:00 14.95
2019-01-02 03:00:00 15.0
2019-01-02 04:00:00 37.0
2019-01-02 05:00:00 15.0
2019-01-02 06:00:00 15.5
I would like to replace all values that do not follow the trend with a NAN. (e.g. the value 37). I was thinking of inserting a condition that considers the value in the previous row. But I don't know if there is a faster way.
CodePudding user response:
You could use find_peaks
to get the values not following the trend (=peaks). find_peaks
offers a variety of methods to define what is a peak.
from scipy.signal import find_peaks
temp = df.Temperature.to_numpy()
idx, _ = find_peaks(temp, threshold=5)
temp[idx] = np.nan
df.Temperature = temp
CodePudding user response:
You can do simply:
df.loc[df.Temperature - df.Temperature.shift(-1) > 0, 'Temperature'] = np.nan
df:
Time Temperature
2019-01-02 02:00:00 14.95
2019-01-02 03:00:00 15.00
2019-01-02 04:00:00 NaN
2019-01-02 05:00:00 15.00
2019-01-02 06:00:00 15.50
CodePudding user response:
You might have to define more tightly what you mean by "follow the trend", but I'll give an example for, say, a point that is more than 1.5 times the mean of points within a 5 timeslot window.
You could use pandas Series.rolling() to get a local rolling mean and then use pandas series slice indexing to apply the condition.
# Make some random data with an outlier
data_points = 48
random_data = np.random.random(data_points)
temps = random_data * 2 14
temps[6] = 37.0
times = pd.date_range(start="2019-01-02 02:00:00", freq="H", periods=data_points)
s = pd.Series(data=temps, index=times)
# See data with the outlier
print(s)
# Use pandas Series.rolling() to find local rolling mean
rolling_mean = s.rolling(5,min_periods=1,center=True).mean()
# Use Pandas slice indesxing to alter only values > 1.5 times the rolling mean
s[s > rolling_mean * 1.5]=float("NaN")
# Outlier replaced with NaN
print(s)