I have the following example dataframes.
group_a = {'0':[2.0,9.4,10.8,0.6,9.4,0.1],
'1':[4.2,7.1,3,6.3,7.8,0.01],
'3':[8.1,9.5,6.1,5.6,2,2.2]
}
A = pd.DataFrame(group_a, index=['aa','ab','ac','ad','ae','af'])
group_b = {'0':[7.0,5.8,11.0,5.8,2.1,2.4],
'1':[5.3,9.9,0.7,4.4,1.6,0.6],
'3':[6.7,0.2,3.9,7.5,7.9,5.6]
}
B = pd.DataFrame(group_a, index=['ba','bb','bc','bd','be','bf'])
That look like this: A:
0 1 3
aa 2.0 4.20 8.1
ab 9.4 7.10 9.5
ac 10.8 3.00 6.1
ad 0.6 6.30 5.6
ae 9.4 7.80 2.0
af 0.1 0.01 2.2
That look like this: B:
0 1 3
ba 2.0 4.20 8.1
bb 9.4 7.10 9.5
bc 10.8 3.00 6.1
bd 0.6 6.30 5.6
be 9.4 7.80 2.0
bf 0.1 0.01 2.2
I would like to calculate the cosign distance between the two tables. The index value represents a point and every point has a 1 * 3 matrix of values.
The output should look like this
ba bb bc bd be bf
aa dist(aa & ba) x x
ab x x
ac x
ad
ae
af
CodePudding user response:
You can use scipy's cdist:
from scipy.spatial.distance import cdist
pd.DataFrame(cdist(A,B, metric='cosine'),
index=A.index, columns=B.index)
Output:
ba bb bc bd be bf
aa 1.110223e-16 1.116861e-01 0.298574 0.074919 0.413914 0.121973
ab 1.116861e-01 1.110223e-16 0.063957 0.190125 0.131183 0.342569
ac 2.985744e-01 6.395706e-02 0.000000 0.447878 0.131884 0.482993
ad 7.491936e-02 1.901254e-01 0.447878 0.000000 0.369183 0.331394
ae 4.139141e-01 1.311832e-01 0.131884 0.369183 0.000000 0.801238
af 1.219732e-01 3.425690e-01 0.482993 0.331394 0.801238 0.000000