Home > Software design >  Is standardizing training data for a model intended to make predictions on streaming data actually r
Is standardizing training data for a model intended to make predictions on streaming data actually r

Time:07-04

I am trying to predict activities (e.g. running, sitting, walking) based on non-streaming data with a convolutional neural network. Standardizing or normalizing the training and test data improves the predictive performance of the network significantly, compared to not standardizing or normalizing it. So far, so good.

However, I thought about making my approach also feasible for streaming data. But, from theory, I cannot imaging that standardizing or normalizing streaming data works. Because, for standardization, you need to know mean and standard deviation, which could change with every new incoming data point. For normalization, you need to know the minimum and maximum value of your data, which could also change with every new incoming data point. Since that is the case it would not even make sense to standardize or normalize the training data because that would mean creating a different distribution for training and unseen data.

I am not quite sure if I missed something here. But is there a workaround for streaming data, so that standardization or normalization can be applied?

CodePudding user response:

What is missing is that pretty much every model you will train assumes your training distribution is somewhat constant. If your data keeps changing, then you have to keep updating/retraining your model, thus the fact that you have to refit your normalizers really does not matter. Estimation of a mean/std has a super low sample comlpexity compared to fitting literally any ML model, so if your model does not diverge, neither will your normaliser.

  • Related