Home > database >  Apply function on all but K columns in a dataframe (similarity of a vector to each member of the df,
Apply function on all but K columns in a dataframe (similarity of a vector to each member of the df,

Time:03-24

I have the following df and vector:

df = pd.DataFrame(dict(id = [0,1,2], a=[4,5,6], b=[22,11.1,-5]))
vec = pd.DataFrame(dict(id = [90], a=[9], b=[6.4]))

I want to calculate a similarity of vec to all vectors in df, without the id column. I know I can hold out the columns, calculate the similarity and glue it back.

df_id = df['id']
vec_id = vec['id']

df_wo_id = df.drop(['id'], axis=1)
vec_wo_id = vec.drop(['id'], axis=1)
df_wo_id['similarity'] = df_wo_id.apply(lambda row: 1 - cosine(row, vec_wo_id),axis=1)

df = pd.concat([df_id, df_wo_id], axis=1)

Is there another approach to apply some function like the one above simpler (maybe using some masks)?

CodePudding user response:

you can specify a mask beforehand and use it in the apply method:

mask = ~df.columns.isin(['id'])
df['similiarity'] = df.apply(lambda row: 1 - cosine(row[mask], vec.loc[:,mask]),axis=1)
  • Related