Home > OS >  pandas dataframe rows doesnt match its shape
pandas dataframe rows doesnt match its shape

Time:03-17

I am doing Kaggle house prices competition. I have split the features into numeric and categorical to deal with their NaN separately.

I have filled categorical data's NaN with 'None' and there are duplicates in the rows of categorical features.

If I print out its shape, it shows correctly as (2919,43) However, If I print its tail, the rows ended at 1458, where are all the missing rows?

I found out this problem because InvalidIndexError: Reindexing only valid with uniquely valued Index objects popped up when I tried to concate the numeric features and the categorical features by running X = pd.concat([X_numeric, X_categorical], axis=1,ignore_index=True).

enter image description here

CodePudding user response:

Your index is probably messed up. Add these lines of code:

X_categorical = X_categorical.reset_index(drop=True)
X_numeric = X_numeric.reset_index(drop=True)

X_categorical.tail(3)
  • Related