my sample data looks like below
customer_id revenue_m10 revenue_m9 revenue_m8 target
1 1234 1231 1256 1239
2 5678 3425 3255 2345
I am trying to split my dataset into train and test based on scikit-learn's train_test_split module.
So, I tried the below code
X_train,X_test,y_train, y_test = train_test_split(
sample_set_df[all_features],
sample_set_df[target_var],
test_size=0.3
)
But when I view y_test
, it looks like below with NaNs
like below. Not sure what is the issue. Is the index number missing or any other issue?
if index is an issue, cam I know how can we solve this?
CodePudding user response:
y_test
is a pandas Series, printing it displays its index and the data. It seems that sample_set_df
has NaNs
in its index.
Having NaNs
in the index does not affect how train_test_split
splits the data. You might have an issue with the actual data though. The target is 0 when you have NaNs
.