I am doing Kaggle house prices competition. I have split the features into numeric and categorical to deal with their NaN separately.
I have filled categorical data's NaN with 'None' and there are duplicates in the rows of categorical features.
If I print out its shape, it shows correctly as (2919,43)
However, If I print its tail
, the rows ended at 1458, where are all the missing rows?
I found out this problem because InvalidIndexError: Reindexing only valid with uniquely valued Index objects
popped up when I tried to concate the numeric features and the categorical features by running X = pd.concat([X_numeric, X_categorical], axis=1,ignore_index=True)
.
CodePudding user response:
Your index is probably messed up. Add these lines of code:
X_categorical = X_categorical.reset_index(drop=True)
X_numeric = X_numeric.reset_index(drop=True)
X_categorical.tail(3)