Home > front end >  math operations on 3 lists at the same time
math operations on 3 lists at the same time

Time:11-12

I have six files (from protein data bank) that contains x , y , z coordinates of two proteins called CYS and LYS. the final goal is to calculate the distance between every CYS from every LYS in each file.

I have extracted the coordinates and put then in six separate lists. Now I need to calculate the distance from the xyz coordinates as:

dist = math.sqrt((xc - xl)**2   (yc - yl)**2   (zc - zl)**2)

But I don't know how to loop over the six lists to calculate the distances between CYS and LYS in each file.

here is how the contents of a file looks like (just copied a part that contains LYS from a file as an example):

ATOM     43  CA  LYS A   7     106.336  41.686 -11.244  1.00 21.93           C
ATOM     44  C   LYS A   7     106.561  41.901 -12.727  1.00 21.10           C
ATOM     45  O   LYS A   7     106.327  43.032 -13.214  1.00 24.85           O
ATOM     46  CB  LYS A   7     107.553  41.913 -10.402  1.00 24.26           C
ATOM     47  CG  LYS A   7     107.550  41.181  -9.058  1.00 33.89           C
ATOM     48  CD  LYS A   7     108.522  41.766  -8.051  1.00 35.19           C
ATOM     49  CE  LYS A   7     109.455  40.737  -7.453  1.00 58.09           C
ATOM     50  NZ  LYS A   7     110.799  40.722  -8.120  1.00 55.93           N
ATOM     51  N   THR A   8     106.979  40.859 -13.401  1.00 19.73           N
ATOM     52  CA  THR A   8     107.196  40.777 -14.860  1.00 21.18           C
ATOM     53  C   THR A   8     105.925  41.136 -15.620  1.00 21.07           C
ATOM     54  O   THR A   8     105.925  42.020 -16.497  1.00 14.72           O

Here is my code:

BaseDir=os.getcwd()

all_files = np.sort(glob('*[0-600]*.ent'))

for filename in all_files:

    Xc = [] # X coordinate of CYS
    Yc = []
    Zc = []
    Xl = []  # X coordinate of LYS
    Yl = []
    Zl = []

    f = open(filename)
    Lines = f.readlines()
    for i in range(1, len(Lines)):
        if 'CA  CYS' in Lines[i]:
           linec = Lines[i].split()
           if 'CA  CYS' in Lines[i] and linec[0]=='ATOM':
              xc, yc, zc = linec[6] , linec[7], linec[8]
              Xc.append(xc)
              Yc.append(yc)
              Zc.append(zc)
        if 'CA  LYS' in Lines[i]:
            linel = Lines[i].split()
            if 'CA  LYS' in Lines[i] and linel[0]=='ATOM':
              xl, yl, zl = linel[6] , linel[7], linel[8]
              Xl.append(xl)
              Yl.append(yl)
              Zl.append(zl)
    dist = math.sqrt((xc - xl)**2   (yc - yl)**2   (zc - zl)**2)

When I print(Xc, filename) it returns:

['87.372', '73.504', '86.059', '82.490', '74.176', '80.312'] 1.ent
['22.872', '13.708'] 2.ent
[] 3.ent
['62.740', '33.741', '18.064', '46.480', '36.255', '63.534', '49.543', '22.826'] 4.ent
['23.404', '-2.617', '50.714', '11.544', '38.216', '-17.818', '-7.237', '21.019', '-19.612', '37.235', '8.371', '51.634'] 5.ent
['66.407', '63.032', '60.134', '14.158', '17.494', '20.312'] 6.ent

And when I print (Xl, filename):

['106.336', '105.826', '101.645', '81.196', '90.656', '96.290', '97.616', '93.983'] 1.ent
['4.430', '5.438', '19.787', '14.569', '23.059', '22.801', '16.723', '15.916'] 2.ent
['22.609', '32.122', '43.387', '41.576', '41.878', '38.004', '33.163', '38.948', '30.836', '23.899'] 3.ent
['21.847', '11.694', '10.507', '11.545', '11.775', '19.945', '27.931', '37.720', '46.445', '32.629', '30.896', '20.769', '16.377', '9.590', '15.170', '14.925', '47.464', '41.800', '24.277', '51.964', '36.706', '30.401', '25.410', '30.474', '50.309', '49.434', '40.009', '44.067', '43.220', '47.551', '52.487', '48.386', '40.121', '37.329', '21.309', '29.918', '35.721', '16.986', '14.680', '11.808', '11.466', '12.679', '17.290', '27.441', '27.388', '16.853', '52.991', '63.359', '67.769', '73.203', '68.424', '71.665', '34.917', '43.296', '60.160', '34.711', '50.052', '56.439', '60.780', '55.977', '37.295', '37.875', '47.683', '44.875', '42.006', '37.175', '32.072', '39.541', '48.253', '49.848', '65.227', '57.237', '48.009', '67.401', '70.352', '73.582', '74.629', '73.458', '70.474', '61.632', '60.699', '68.440'] 4.ent
['-0.840', '32.630', '27.111', '5.772', '0.552', '5.795', '27.208', '25.416', '24.445', '15.503', '33.113', '19.430', '17.972', '22.147', '27.065', '16.759', '12.083', '-3.498', '10.533', '-10.681', '-8.709', '2.418', '-7.800', '-22.468', '-19.818', '-22.713', '-19.877', '-10.223', '-12.596', '-21.356', '1.043', '-4.927', '-21.858', '-21.388', '-15.276', '3.474', '1.652', '-0.966', '-8.278', '23.326', '-1.463', '9.358', '13.785', '18.642', '7.074', '1.475', '-6.532', '-3.374', '-14.994', '2.388', '18.468', '-1.254', '55.980'] 5.ent
['67.045', '49.407', '52.772', '52.214', '55.680', '55.832', '78.610', '67.134', '79.549', '80.258', '80.339', '74.666', '73.443', '65.523', '67.405', '70.133', '66.798', '61.540', '49.690', '49.952', '50.093', '43.900', '49.549', '45.703', '39.861', '54.826', '59.250', '66.840', '43.908', '37.976'] 6.ent

CodePudding user response:

Here's a start:

import numpy as np
from scipy.spatial.distance import cdist


cys_coords = np.loadtxt("cys_data.txt", usecols=(6, 7, 8))
lys_coords = np.loadtxt("lys_data.txt", usecols=(6, 7, 8))  # assuming the same format
distances = cdist(cys_coords, lys_coords)

You can modify this to maybe loop through a list of filepath strings to read in your data. If you know a priori how many data points you have, you can pre-allocate numpy arrays for your CYS and LYS data.

  • Related