How to find the most similar molecule to the actual drug having this data?-CodePudding

I have been trying to select the most similar molecule in the data below using python. Since I'm new to python programming, I couldn't do more than plotting. So how could we consider all factors, such as surface area, volume, and ovality, for choosing the best molecule? The most similar molecule should replicate the drug V0L in all aspects. V0L IS THE ACTUAL DRUG (the last row), The rest are the molecules.

    Mol   Su     Vol        Su/Vol  PSA      Ov     D   A     Mw    Vina

 1.  1  357.18  333.9   1.069721473 143.239 1.53    5   10  369.35  -8.3
 2.  2  510.31  496.15  1.028539756 137.388 1.68    6   12  562.522 -8.8
 3.  3  507.07  449.84  1.127223013 161.116 1.68    6   12  516.527 -9.0
 4.  4  536.54  524.75  1.022467842 172.004 1.71    7   13  555.564 -9.8
 5.  5  513.67  499.05  1.029295662 180.428 1.69    7   13  532.526 -8.9
 6.  6  391.19  371.71  1.052406446 152.437 1.56    6   11  408.387 -8.9
 7.  7  540.01  528.8   1.021198941 149.769 1.71    7   13  565.559 -9.4
 8.  8  534.81  525.99  1.01676838  174.741 1.7     7   13  555.564 -9.3
 9.  9  533.42  520.67  1.024487679 181.606 1.7     7   14  566.547 -9.7
 10. 10 532.52  529.47  1.005760477 179.053 1.68    8   14  571.563 -9.4
 11. 11 366.72  345.89  1.060221458 159.973 1.54    6   11  385.349 -8.2
 12. 12 520.75  504.36  1.032496629 168.866 1.7     6   13  542.521 -8.7
 13. 13 512.69  499     1.02743487  179.477 1.69    7   13  532.526-8.6
 14. 14 542.78  531.52  1.021184527 189.293 1.71    7   14  571.563 -9.6
 15. 15 519.04  505.7   1.026379276 196.982 1.69    8   14  548.525 -8.8
 16. 16 328.95  314.03  1.047511384 125.069 1.47    4   9   339.324 -6.9
 17. 17 451.68  444.63  1.01585588  118.025 1.6     5   10  466.47  -9.4
 18. 18 469.67  466.11  1.007637682 130.99  1.62    5   11  486.501 -8.3
 19. 19 500.79  498.09  1.005420707 146.805 1.65    6   12  525.538 -9.8
 20. 20 476.59  473.03  1.00752595  149.821 1.62    6   12  502.5   -8.4
 21. 21 357.84  347.14  1.030823299 138.147 1.5     5   10  378.361 -8.6
 22. 22 484.15  477.28  1.014394066 129.93  1.64    6   11  505.507 -10.2
 23. 23 502.15  498.71  1.006897796 142.918 1.65    6   12  525.538 -9.3
 24. 24 526.73  530.31  0.993249232 154.106 1.66    7   13  564.575 -9.9
 25. 25 509.34  505.64  1.007317459 161.844 1.66    7   13  541.537 -9.2
 26. 26 337.53  320.98  1.051560845 144.797 1.49    5   10  355.323 -7.1
 27. 27 460.25  451.58  1.019199256 137.732 1.62    5   11  482.469 -9.6
 28. 28 478.4   473.25  1.010882198 155.442 1.63    6   12  502.5   -8.9
 29. 29 507.62  505.68  1.003836418 161.884 1.65    6   13  541.537 -9.2
 30. 30 482.27  479.07  1.006679608 171.298 1.63    7   13  518.499 -9.1
 31.V0L 355.19  333.42  1.065293024 59.105  1.530   0   9   345.37  -10.4

Su = Surface Area in squared angstrom
Vol = Volume in cubic angstrom
PSA = Polar Surface Area in squared angstrom
Ov = Ovality
D= Number of Hydrogen Bond Donating group
A = Number of Hydrogen Bond Donating group
Vina = Binding affinity (lower is better)
Mw = Molecular Weight
Mol = The number of molecule candidate

CodePudding user response：

I have done that and plotted some basic plots.

but I wanted the program to consider all factors and give me a plot or any other form of data and pick the most similar.

CodePudding user response：

In order to find the most similar molecule we can use euclidean distance between all rows and the last one, and pick up the row having minimal distance value:

# make the last row as a new dataframe named `df1`

df1 = df[30:31]

# And the first rows in another dataframe:

df2 = df[0:31]

And use scipy.spatial package :

import scipy.spatial
ary = scipy.spatial.distance.cdist(df2, df1, metric='euclidean')
df[ary==ary.min()]

Output

This output is by using the previous dataframe before new edits of the question :

    Molecule    SurfaceAr   Volume  PSA Ovality HBD HBA Mw  Vina BA Su/Vol
15  RiboseGly   1.047511    314.03  125.069 1.47    4   9   339.324 -6.9    0.003336