My task is to detect outliers using the Z score and replace their value with the previous valid value.
signal = ['229.84', '227.8', '221.16', '220.6', '217.52', '225.2', '221.68', '221.68', '225.24', '218.6', '218.6', '222.08', '219.96', '219.52', '223.8', '223.72', '222.6', '222.68', '228.2', '221.84', '229.36', '227.48', '227.48', '226.56', '226.24', '215.32', '220.76', '222.44', '234.12', '226.56', '228.04', '236.64', '228.32', '236.72', '236.84', '237.64', '213.92', '235.52', '238.0', '239.12', '237.12', '217.24', '229.4', '229.4', '239.56', '236.2', '236.2', '220.04', '232.24', '223.92', '220.6', '242.96', '220.4', '242.2', '243.28', '241.72', '241.12', '241.8', '236.6', '234.24', '233.84', '234.8', '236.88', '244.8', '236.0', '230.84', '229.6', '229.84', '214.8', '231.48', '239.6', '239.56', '222.88', '238.24', '238.92', '235.36', '217.48', '217.2', '217.12', '218.08', '222.04', '89.48', '88.8', '223.2', '213.6', '239.6', '214.52', '95.8', '210.8', '209.92', '210.4', '215.76', '210.28', '211.76', '210.64', '211.36', '210.84', '201.84', '211.16', '242.16', '233.28', '212.8', '207.44', '209.0', '208.52', '207.44', '212.08', '210.96', '203.12', '207.76', '202.8', '203.16', '208.36', '209.76', '211.24', '211.24', '211.24', '206.04', '209.76', '210.2', '195.96', '195.84', '207.2', '201.92', '203.8', '199.96', '206.24', '204.12', '233.92', '230.68', '226.4', '221.6', '226.68', '226.56', '225.6', '223.72', '220.44', '223.64', '225.52', '223.96', '228.0', '227.44', '224.4', '223.32', '220.08', '220.2', '221.8', '218.08', '218.08', '216.96']
import numpy as np
mean = np.mean(results)
std = np.std(results)
threshold = -1.5
outlier = []
new_list = []
for i in results:
z = (i-mean)/std
if z < threshold:
outlier.append(i)
When I change it to:
for i in results:
z = (i-mean)/std
if z < threshold:
outlier.append(i)
results[i] = results[i-1]
It gives error: list indices must be integers or slices, not float
outlier in the dataset is [89.48, 88.8, 95.8]
The final list should have these values replaced with the previous one(only if prev value's z score disqualifies the condition z < threshold
.
CodePudding user response:
you need to convert the float value into a integer value, for each "i" in your for loop, using built-in function int()
CodePudding user response:
i
in your code is the value in the list. You are using it both as a value when computing your z
value and an index when assigning the value of the previous result.
Use enumerate
to get both the index, value of each element of your list like this:
for i, value in enumerate( results):
z = (value-mean)/std
if z - threshold:
outlier.append(value)
results[i] = results[i-1]
If I understood well your code this version should give you the expected result.
import numpy as np
signal = ['229.84', '227.8', '221.16', '220.6', '217.52', '225.2', '221.68', '221.68', '225.24', '218.6', '218.6', '222.08', '219.96', '219.52', '223.8', '223.72', '222.6', '222.68', '228.2', '221.84', '229.36', '227.48', '227.48', '226.56', '226.24', '215.32', '220.76', '222.44', '234.12', '226.56', '228.04', '236.64', '228.32', '236.72', '236.84', '237.64', '213.92', '235.52', '238.0', '239.12', '237.12', '217.24', '229.4', '229.4', '239.56', '236.2', '236.2', '220.04', '232.24', '223.92', '220.6', '242.96', '220.4', '242.2', '243.28', '241.72', '241.12', '241.8', '236.6', '234.24', '233.84', '234.8', '236.88', '244.8', '236.0', '230.84', '229.6', '229.84', '214.8', '231.48', '239.6', '239.56', '222.88', '238.24', '238.92', '235.36', '217.48', '217.2', '217.12', '218.08', '222.04', '89.48', '88.8', '223.2', '213.6', '239.6', '214.52', '95.8', '210.8', '209.92', '210.4', '215.76', '210.28', '211.76', '210.64', '211.36', '210.84', '201.84', '211.16', '242.16', '233.28', '212.8', '207.44', '209.0', '208.52', '207.44', '212.08', '210.96', '203.12', '207.76', '202.8', '203.16', '208.36', '209.76', '211.24', '211.24', '211.24', '206.04', '209.76', '210.2', '195.96', '195.84', '207.2', '201.92', '203.8', '199.96', '206.24', '204.12', '233.92', '230.68', '226.4', '221.6', '226.68', '226.56', '225.6', '223.72', '220.44', '223.64', '225.52', '223.96', '228.0', '227.44', '224.4', '223.32', '220.08', '220.2', '221.8', '218.08', '218.08', '216.96']
# Converting the strings to floats
results = [ float(s) for s in signal]
mean = np.mean(results)
std = np.std(results)
threshold = -1.5
outlier = []
new_list = [0 for k in range(len(results))]
for i, value in enumerate(results):
z = (value-mean)/std
if float(z) < threshold:
outlier.append(value)
new_list[i] = new_list[i-1]
else:
new_list[i] = value