I am working on a prediction problem. In my training set, I have around 8,700 samples and around 1,000 features. I used different models but still, it is highly biased. So, I decided to add new features to the model. I added some lags to the features and then used the polynomial tools in sklearn to generate polynomial features (degree=2).
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(2)
X_poly = poly.fit_transform(X)
X = pd.DataFrame(X_poly, columns=poly.get_feature_names_out(), index=X.index)
Now, I have around 490,000 features. Next, when I want to do the feature scaling,
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X)
I face an error in jupyternotebook saying "dead kernel" and I cannot go further.
What should I do? Any suggestion?
CodePudding user response:
You need to do a batch processing with partial fit
and then transform
(also needs a loop):
scaler = StandardScaler()
n = X.shape[0] # rows
batch_size = 1000
i = 0
while i < n:
partial_size = min(batch_size, n - i)
partial_x = X[i:i partial_size]
scaler.partial_fit(partial_x)
i = partial_size