Home > Mobile >  How to replace the previous value with the current value based on a condition?
How to replace the previous value with the current value based on a condition?

Time:10-08

My task is to detect outliers using the Z score and replace their value with the previous valid value.

signal = ['229.84', '227.8', '221.16', '220.6', '217.52', '225.2', '221.68', '221.68', '225.24', '218.6', '218.6', '222.08', '219.96', '219.52', '223.8', '223.72', '222.6', '222.68', '228.2', '221.84', '229.36', '227.48', '227.48', '226.56', '226.24', '215.32', '220.76', '222.44', '234.12', '226.56', '228.04', '236.64', '228.32', '236.72', '236.84', '237.64', '213.92', '235.52', '238.0', '239.12', '237.12', '217.24', '229.4', '229.4', '239.56', '236.2', '236.2', '220.04', '232.24', '223.92', '220.6', '242.96', '220.4', '242.2', '243.28', '241.72', '241.12', '241.8', '236.6', '234.24', '233.84', '234.8', '236.88', '244.8', '236.0', '230.84', '229.6', '229.84', '214.8', '231.48', '239.6', '239.56', '222.88', '238.24', '238.92', '235.36', '217.48', '217.2', '217.12', '218.08', '222.04', '89.48', '88.8', '223.2', '213.6', '239.6', '214.52', '95.8', '210.8', '209.92', '210.4', '215.76', '210.28', '211.76', '210.64', '211.36', '210.84', '201.84', '211.16', '242.16', '233.28', '212.8', '207.44', '209.0', '208.52', '207.44', '212.08', '210.96', '203.12', '207.76', '202.8', '203.16', '208.36', '209.76', '211.24', '211.24', '211.24', '206.04', '209.76', '210.2', '195.96', '195.84', '207.2', '201.92', '203.8', '199.96', '206.24', '204.12', '233.92', '230.68', '226.4', '221.6', '226.68', '226.56', '225.6', '223.72', '220.44', '223.64', '225.52', '223.96', '228.0', '227.44', '224.4', '223.32', '220.08', '220.2', '221.8', '218.08', '218.08', '216.96']

import numpy as np
mean = np.mean(results)
std = np.std(results)


threshold = -1.5
outlier = []
new_list = []

for i in results:
            z = (i-mean)/std
            if z < threshold:
                   outlier.append(i)

When I change it to:

 for i in results:
            z = (i-mean)/std
            if z < threshold:
                   outlier.append(i)
                   results[i] = results[i-1]

It gives error: list indices must be integers or slices, not float

outlier in the dataset is [89.48, 88.8, 95.8]

The final list should have these values replaced with the previous one(only if prev value's z score disqualifies the condition z < threshold.

CodePudding user response:

you need to convert the float value into a integer value, for each "i" in your for loop, using built-in function int()

CodePudding user response:

i in your code is the value in the list. You are using it both as a value when computing your z value and an index when assigning the value of the previous result.

Use enumerate to get both the index, value of each element of your list like this:

for i, value in enumerate( results):
            z = (value-mean)/std
            if z - threshold:
                   outlier.append(value)
                   results[i] = results[i-1]

If I understood well your code this version should give you the expected result.

import numpy as np


signal = ['229.84', '227.8', '221.16', '220.6', '217.52', '225.2', '221.68', '221.68', '225.24', '218.6', '218.6', '222.08', '219.96', '219.52', '223.8', '223.72', '222.6', '222.68', '228.2', '221.84', '229.36', '227.48', '227.48', '226.56', '226.24', '215.32', '220.76', '222.44', '234.12', '226.56', '228.04', '236.64', '228.32', '236.72', '236.84', '237.64', '213.92', '235.52', '238.0', '239.12', '237.12', '217.24', '229.4', '229.4', '239.56', '236.2', '236.2', '220.04', '232.24', '223.92', '220.6', '242.96', '220.4', '242.2', '243.28', '241.72', '241.12', '241.8', '236.6', '234.24', '233.84', '234.8', '236.88', '244.8', '236.0', '230.84', '229.6', '229.84', '214.8', '231.48', '239.6', '239.56', '222.88', '238.24', '238.92', '235.36', '217.48', '217.2', '217.12', '218.08', '222.04', '89.48', '88.8', '223.2', '213.6', '239.6', '214.52', '95.8', '210.8', '209.92', '210.4', '215.76', '210.28', '211.76', '210.64', '211.36', '210.84', '201.84', '211.16', '242.16', '233.28', '212.8', '207.44', '209.0', '208.52', '207.44', '212.08', '210.96', '203.12', '207.76', '202.8', '203.16', '208.36', '209.76', '211.24', '211.24', '211.24', '206.04', '209.76', '210.2', '195.96', '195.84', '207.2', '201.92', '203.8', '199.96', '206.24', '204.12', '233.92', '230.68', '226.4', '221.6', '226.68', '226.56', '225.6', '223.72', '220.44', '223.64', '225.52', '223.96', '228.0', '227.44', '224.4', '223.32', '220.08', '220.2', '221.8', '218.08', '218.08', '216.96']

# Converting the strings to floats
results = [ float(s) for s in signal]
mean = np.mean(results)
std = np.std(results)


threshold = -1.5 
outlier = []
new_list = [0 for k in range(len(results))] 

for i, value in enumerate(results):
            z = (value-mean)/std

            if float(z) < threshold: 
                outlier.append(value)
                new_list[i] = new_list[i-1]
            else:
                new_list[i] = value
  • Related