Home > Software engineering >  Dataframe of different size but no difference in columns
Dataframe of different size but no difference in columns

Time:05-06

I am realizing an XG Boost model. I did my train-test split on a dataframe having 91 columns. I want to use my model on a new dataframe which have different columns than my training set. I have removed the extra columns and added the ones which were present in the train dataset and not the new one.

enter image description here

However, I cannot use the models because the new set does not have the same number of columns but when I am computing the list of the differences in columns the list is empty.

enter image description here

Do you have an idea of how I could correct this problem ?

Thanks in advance for your time !

CodePudding user response:

You can try like this :

import pandas as pd

X_PAU = pd.DataFrame({'test1': ['A', 'A'], 'test2': [0, 0]})
print(len( X_PAU.columns ))
X = pd.DataFrame({'test1': ['A', 'A']})
print(len( X.columns ))

# Your implimentation
print(set(X.columns) - set(X_PAU.columns)) #This should be empty set

#
print(X_PAU.columns.difference(X.columns).tolist()) # this will print the missing column name
print(len(X_PAU.columns.difference(X.columns).tolist())) # this will print the difference number

Output:

2
1
set()
['test2']
1
  • Related