I am working on an inventory forecasting model and I require specific data in order to train and test the model. Currently, I am trying to use one year worth of data to build a basic linear regression model to predict for the following year.
What I am having trouble with is removing outliers from my dataframe that contains 2 different types of outliers ("quantity" and "dates"), and I am only trying to remove the outliers using "quantity".
CodePudding user response:
You can remove the outliers by comparing them to the mean or median (I suggest using the median). Divide the distance between each value and the median by the distance between the maximum and median values if it is greater than a threshold value (eg 0.98, It depends on your data and only you can select it) Delete that data. For example, if you set your threshold to 1, the farthest data will be deleted.