I have the following df and vector:
df = pd.DataFrame(dict(id = [0,1,2], a=[4,5,6], b=[22,11.1,-5]))
vec = pd.DataFrame(dict(id = [90], a=[9], b=[6.4]))
I want to calculate a similarity of vec to all vectors in df, without the id column. I know I can hold out the columns, calculate the similarity and glue it back.
df_id = df['id']
vec_id = vec['id']
df_wo_id = df.drop(['id'], axis=1)
vec_wo_id = vec.drop(['id'], axis=1)
df_wo_id['similarity'] = df_wo_id.apply(lambda row: 1 - cosine(row, vec_wo_id),axis=1)
df = pd.concat([df_id, df_wo_id], axis=1)
Is there another approach to apply some function like the one above simpler (maybe using some masks)?
CodePudding user response:
you can specify a mask beforehand and use it in the apply method:
mask = ~df.columns.isin(['id'])
df['similiarity'] = df.apply(lambda row: 1 - cosine(row[mask], vec.loc[:,mask]),axis=1)