How to join 2 columns of word embeddings in Pandas-CodePudding

I have extracted word embeddings of 2 different texts (title and description) and want to train an XGBoost model on both embeddings. Now embeddings are 200 in dimension each as can be seen below:

Now I was able to train the model on 1 embedding data and it worked perfectly like this:

x=df['FastText']  #training features
y=df['Category'] # target variable

#Defining Model
model = XGBClassifier(objective='multi:softprob')

#Evaluation metrics
score=['accuracy','precision_macro','recall_macro','f1_macro']

#Model training with 5 Fold Cross Validation
scores = cross_validate(model,  np.vstack(x), y, cv=5, scoring=score)

Now I want to use both the features for training but it gives me an error if I pass 2 columns of df like this:

x=df[['FastText_Title','FastText']]

One solution I tried is adding both the embeddings like x1 x2 but it decreases accuracy significantly, How do I use both features in cross_validate function, Kindly help.

CodePudding user response：

In the past for multiple inputs, I've done this:

features = ['FastText_Title', 'FastText']
x = df[features]
y = df['Category']

It is creating an array containing both datasets. I usually need to scale the data as well using MinMaxScaler once the new array has been made.

CodePudding user response：

According to the error you are getting, it seems that there is something wrong with types. Try this, it will convert your features to numeric and it should work:

df['FastText'] = pd.to_numeric(df['FastText'])
df['FastText_Title'] = pd.to_numeric(df['FastText_Title'])