Home > Back-end >  Replace outlier values with NaN in numpy? (preserve length of array)
Replace outlier values with NaN in numpy? (preserve length of array)

Time:10-19

I have an array of magnetometer data with artifacts every two hours due to power cycling. Magnetometer data, produced using plotly.express, with outliers due to power cycling.

I'd like to replace those indices with NaN so that the length of the array is preserved.

Here's a code example, adapted from Scatter plot indicating that outliers are successfully being dropped. ...but that's looking at the culled y vector relative to the index, rather than the datetime vector x as in the above plot. As the debugging text indicates, the vector is shortened because the outlier values are dropped rather than replaced.

How can I edit my 'reject_outliers()` function to assign those values to NaN, or to adjacent values, in order to keep the length of the array the same so that I can plot my data?

CodePudding user response:

Use else in the list comprehension along the lines of:

[x if x_condition else other_value for x in y]

CodePudding user response:

Got a less compact version to work. Full code:

import numpy as np
import plotly.express as px

# For pulling data from CDAweb:
from ai import cdas
import datetime

# Import data:
start = datetime.datetime(2016, 1, 24, 0, 0, 0)
end = datetime.datetime(2016, 1, 25, 0, 0, 0)
data = cdas.get_data(
                    'sp_phys',
                    'THG_L2_MAG_'  'PG2',
                    start,
                    end,
                    ['thg_mag_'  'pg2']
                )

x =data['UT']
y =data['VERTICAL_DOWN_-_Z']


def reject_outliers(y):   # y is the data in a 1D numpy array
    mean = np.mean(y)
    sd = np.std(y)
    final_list = np.copy(y)
    for n in range(len(y)):
        final_list[n] = y[n] if y[n] > mean - 5 * sd else np.nan
        final_list[n] = final_list[n] if final_list[n] < mean   5 * sd else np.nan
    return final_list

px.scatter(reject_outliers(y))

print('Length of y: ')
print(len(y))
print('Length of y with outliers removed (should be the same): ')
print(len(reject_outliers(y)))
# px.line(y=y, x=x)

px.line(y=reject_outliers(y), x=x)   # This is the line I wanted to get working - check!

Magnetometer results with outliers eliminated, plotted successfully with respect to time.

  • Related