I have this table that in which I am comparing list of articles (Article_body
) with 4 baseline articles using cosine similarity:
Article_body | articleScores1 | articleScores2 | articleScores3 | articleScores4 | articleScores5 |
---|---|---|---|---|---|
a***** | 0.6 | 0.2 | 0.7 | 0.9 | 0.2 |
a***** | 0.3 | 0.8 | 0.1 | 0.5 | 0.1 |
I want to add a column that gives which column has the largest cosine similarity out of 5, condition it should be at least 0.5. If none of CosineSim(i
)
Article_body | articleScores1 | articleScores2 | articleScores3 | articleScores4 | Most_similar_to |
---|---|---|---|---|---|
a***** | 0.6 | 0.2 | 0.7 | 0.9 | CosineSim4 |
a***** | 0.3 | 0.8 | 0.1 | 0.5 | CosineSim2 |
a****** | 0.1 | 0.2 | 0.3 | 0.4 | False |
I am using this code to achieve this:
cos_cols = [f"articleScores{i}" for i in range(1, 6)]
def n_lar(text):
if (df[cos_cols].idxmax(axis=1)) <0.5:
return False
else:
df['Max'] = (df[cos_cols].idxmax(axis=1))
df['Most_similar_to'] = df.apply(n_lar)
However, I am getting this error:
TypeError: '<' not supported between instances of 'str' and 'float'
How can I resolve this?
edit:
I have this table that in which I am comparing list of articles (Article_body) with 4 baseline articles using cosine similarity:
I want to add a column that gives which column has the largest cosine similarity out of 5, condition it should be at least 0.5. If none of CosineSim(i) is atleast 0.5 then return False as in the table 2
CodePudding user response:
(df.iloc[:, 1:-1]
.astype('float')
.apply(
lambda x: ('CosineSim' x.idxmax()[-1]) if x.max() >= 0.5 else False
, axis=1)
)
output:
0 CosineSim4
1 CosineSim2
2 False
make result to Most_similar_to
column