Home > Net >  How to join 2 columns of word embeddings in Pandas
How to join 2 columns of word embeddings in Pandas

Time:09-24

I have extracted word embeddings of 2 different texts (title and description) and want to train an XGBoost model on both embeddings. Now embeddings are 200 in dimension each as can be seen below:

enter image description here

Now I was able to train the model on 1 embedding data and it worked perfectly like this:

x=df['FastText']  #training features
y=df['Category'] # target variable

#Defining Model
model = XGBClassifier(objective='multi:softprob')

#Evaluation metrics
score=['accuracy','precision_macro','recall_macro','f1_macro']

#Model training with 5 Fold Cross Validation
scores = cross_validate(model,  np.vstack(x), y, cv=5, scoring=score)


Now I want to use both the features for training but it gives me an error if I pass 2 columns of df like this:

x=df[['FastText_Title','FastText']]

One solution I tried is adding both the embeddings like x1 x2 but it decreases accuracy significantly, How do I use both features in cross_validate function, Kindly help.

CodePudding user response:

In the past for multiple inputs, I've done this:

features = ['FastText_Title', 'FastText']
x = df[features]
y = df['Category']

It is creating an array containing both datasets. I usually need to scale the data as well using MinMaxScaler once the new array has been made.

CodePudding user response:

According to the error you are getting, it seems that there is something wrong with types. Try this, it will convert your features to numeric and it should work:

df['FastText'] = pd.to_numeric(df['FastText'])
df['FastText_Title'] = pd.to_numeric(df['FastText_Title'])
  • Related