Is there a way to create and visualize a model for this data?-CodePudding

I am working on a small data base (~70 candidates), they are molecules. I want to find the molecule that fits the best with the actual drug. The molecules have different attribute like the type of amino acid, area, volume, affinity of binding and so on.

I want to systematically pick the one that is the best with respect to the actual drug. How can I do that?

Also I wanted to know which amino acid residue has the bigger impact on the drugs' affinities.

*     Molecules     aff aa1 aa2   SA      V        SA/V
- **V0L            10.4 non non 357.96  334.58  1.069878654**
- Trp-Trp-Glucose   9.9 Trp Trp 381.74  353.17  1.080895886
- Trp-Phe-Glucose   9.2 Trp Phe 431.57  411.31  1.049257251
- Phe-Trp-Glucose   9.1 Phe Trp 411.36  385.49  1.067109393
- Trp-Arg-Glucose   9.1 Trp Arg 440.12  430.72  1.021823923
- Gln-Trp-Glucose   8.9 Gln Trp 502.22  491.99  1.020793106
- Trp-Ala-Glucose   8.9 Trp Ala 494.11  467.79  1.056264563
- Tyr-Trp-Glucose   8.9 Tyr Trp 405.17  382.69  1.058742063
- Trp-Asn-Glucose   8.8 Trp Asn 464.75  440.79  1.05435695
- Tyr-Phe-Glucose   8.8 Tyr Phe 440.93  415     1.062481928
- Glu-Trp-Glucose   8.7 Glu Trp 395.82  377.62  1.0481966
- Ile-Trp-Glucose   8.6 Ile Trp 449.31  436     1.030527523
- Trp-Gly-Glucose   8.6 Trp Gly 427.09  403.61  1.058174971
- Asn-Trp-Glucose   8.5 Asn Trp 398.61  370.53  1.075783337
- Tyr-Val-Glucose   8.5 Tyr Val 444.07  427.72  1.038225942
- Phe-Asn-Glucose   8.4 Phe Asn 431.91  403.36  1.070780444
- Trp-Leu-Glucose   8.4 Trp Leu 429.28  400.87  1.070870856
- Tyr-Arg-Glucose   8.4 Tyr Arg 482.72  459     1.05167756
- Val-Trp-Glucose   8.4 Val Trp 443.64  431.18  1.028897444
- Asn-Phe-Glucose   8.3 Asn Phe 416.65  395.56  1.053316817
- Leu-Trp-Glucose   8.3 Leu Trp 471.93  454     1.039493392
- Phe-Ala-Glucose   8.3 Phe Ala 440.88  426.5   1.033716295
- Trp-Lys-Glucose   8.3 Trp Lys 363.36  334.96  1.084786243
- Gln-Phe-Glucose   8.2 Gln Phe 414.28  393.3   1.053343504
- His-Phe-Glucose   8.2 His Phe 391.99  367.35  1.067074997
- Lys-Trp-Glucose   8.2 Lys Trp 381.2   353.02  1.079825506
- Phe-Arg-Glucose   8.2 Phe Arg 445.42  431.92  1.031255788
- Phe-Val-Glucose   8.2 Phe Val 401.67  380.86  1.0546395
- Ser-Trp-Glucose   8.2 Ser Trp 391.51  376.59  1.039618683
- Tyr-Ala-Glucose   8.2 Tyr Ala 397.06  370     1.073135135
- Tyr-Asn-Glucose   8.2 Tyr Asn 425.75  400.87  1.062065009
- Arg-Phe-Glucose   8.1 Arg Phe 400.62  393.13  1.019052222
- Glu-Phe-Glucose   8.1 Glu Phe 361.66  334.59  1.080904988
- Gly-Trp-Glucose   8.1 Gly Trp 431.2   411.58  1.047669955
- Ile-Phe-Glucose   8.1 Ile Phe 420.93  405.13  1.038999827
- Met-Trp-Glucose   8.1 Met Trp 437.37  407.54  1.073195269
- Phe-Leu-Glucose   8.1 Phe Leu 431.02  416.09  1.03588166
- Thr-Trp-Glucose   8.1 Thr Trp 425.13  406.44  1.045984647
- Val-Phe-Glucose   8.1 Val Phe 461.46  437.11  1.055706801
- Ala-Phe-Glucose   8.0 Ala Phe 384.52  374.87  1.025742257
- Asp-Trp-Glucose   8.0 Asp Trp 387.03  360.2   1.074486396
- Phe-Lys-Glucose   8.0 Phe Lys 397.19  376.55  1.054813438
- Ser-Phe-Glucose   8.0 Ser Phe 486.33  469.53  1.035780461
- Tyr-Gly-Glucose   8.0 Tyr Gly 520.67  502.11  1.036964012
- Leu-Phe-Glucose   7.9 Leu Phe 405.28  388.97  1.041931254
- Tyr-Lys-Glucose   7.9 Tyr Lys 438.43  408.17  1.074135777
- Phe-Gly-Glucose   7.8 Phe Gly 491.96  466.26  1.055119461
- Phe-Ile-Glucose   7.8 Phe Ile 467.85  440.47  1.062160874
- Pro-Trp-Glucose   7.8 Pro Trp 438.87  408.04  1.075556318
- Cys-Phe-Glucose   7.7 Cys Phe 489.5   465.15  1.052348705
- Gly-Phe-Glucose   7.7 Gly Phe 465.85  440.22  1.05822089
- His-Asn-Glucose   7.7 His Asn 439.2   408.19  1.075969524
- His-Leu-Glucose   7.6 His Leu 487.44  474.28  1.027747322
- His-Ala-Glucose   7.5 His Ala 470.46  449.32  1.047048874
- His-Asp-Glucose   7.5 His Asp 437.14  409.32  1.067966383
- His-Ile-Glucose   7.3 His Ile 498.06  467.71  1.064890637
- Met-Phe-Glucose   7.1 Met Phe 471.35  441.58  1.067417003
- His-Gly-Glucose   7.0 His Gly 402.66  378.63  1.063465652
- His-Lys-Glucose   7.0 His Lys 484.1   461.01  1.050085681
- His-Val-Glucose   7.0 His Val 430.62  408.78  1.053427271
- His-Arg-Glucose   6.9 His Arg 423.45  402.47  1.052128109
- Pro-Phe-Glucose   6.9 Pro Phe 422.36  396.37  1.065570048

I have tried to plot them using excel.

CodePudding user response：

Mean of affinities of each first amino acids

I could plot this part of it, thank you. I was wondering if I want to plot Surface/area ratio and decide which one is closest and show it graphically on the plot what should I do?

CodePudding user response：

Assuming you have the .xlsx file containing the raw data, you can import it in pandas, which is a good way of working with databases within python. You can then use matplot to plot and visualize the data. (My examples are gonna seem weird because I do not know much about biochemistry, but I hope you understand how python works from my examples)

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_excel("data.xlsx")
plt.scatter(data['Surface Area'],data['Volume'])
plt.show()

The above code would plot a scatter plot of surface area against volume.

To pick the best candidate, you would have to come up with an objective function that considers all parameters that you want to consider. For example, if you wanted a molecule with the highest ratio of binding affinity to surface area, you would do:

def objective(row):
    return row['Binding Affinity']/row['Surface Area'])

data['Objective'] = data.apply(objective)
print(data[data['Objective'].idxmax()])

This would return the row containing the largest objective function.

You could attempt to estimate the impact of different aa residues on the binding affinity by calculating their means and standard deviations. There may be a better way to do this, but this would be my first attempt.

# To get all unique residues present in the sample data
aa_residues = list(set(data['Amino Acid Residue 1']))
for r in aa_residues:
    mean_ba = data[data['Amino Acid Residue 1'] == r]['Binding Affinity'].mean()
    std_ba = data[data['Amino Acid Residue 1'] == r]['Binding Affinity'].std()
    print(r,mean_ba,std_ba)