Calculating the cosign distance between two Dataframes and appending result to a new dataframe-CodePudding

I have the following example dataframes.

group_a = {'0':[2.0,9.4,10.8,0.6,9.4,0.1],
          '1':[4.2,7.1,3,6.3,7.8,0.01],
          '3':[8.1,9.5,6.1,5.6,2,2.2]
          }

A = pd.DataFrame(group_a, index=['aa','ab','ac','ad','ae','af'])

group_b = {'0':[7.0,5.8,11.0,5.8,2.1,2.4],
          '1':[5.3,9.9,0.7,4.4,1.6,0.6],
          '3':[6.7,0.2,3.9,7.5,7.9,5.6]
          }

B = pd.DataFrame(group_a, index=['ba','bb','bc','bd','be','bf'])

That look like this: A:

    0    1      3
aa  2.0  4.20   8.1
ab  9.4  7.10   9.5
ac  10.8 3.00   6.1
ad  0.6  6.30   5.6
ae  9.4  7.80   2.0
af  0.1  0.01   2.2

That look like this: B:

    0     1     3
ba  2.0   4.20  8.1
bb  9.4   7.10  9.5
bc  10.8  3.00  6.1
bd  0.6   6.30  5.6
be  9.4   7.80  2.0
bf  0.1   0.01  2.2

I would like to calculate the cosign distance between the two tables. The index value represents a point and every point has a 1 * 3 matrix of values.

The output should look like this

    ba              bb  bc  bd  be  bf
aa  dist(aa & ba)   x   x           
ab  x               x               
ac  x                   
ad                      
ae                      
af

CodePudding user response：

You can use scipy's cdist:

from scipy.spatial.distance import cdist

pd.DataFrame(cdist(A,B, metric='cosine'),
             index=A.index, columns=B.index)

Output:

              ba            bb        bc        bd        be        bf
aa  1.110223e-16  1.116861e-01  0.298574  0.074919  0.413914  0.121973
ab  1.116861e-01  1.110223e-16  0.063957  0.190125  0.131183  0.342569
ac  2.985744e-01  6.395706e-02  0.000000  0.447878  0.131884  0.482993
ad  7.491936e-02  1.901254e-01  0.447878  0.000000  0.369183  0.331394
ae  4.139141e-01  1.311832e-01  0.131884  0.369183  0.000000  0.801238
af  1.219732e-01  3.425690e-01  0.482993  0.331394  0.801238  0.000000