I have six files (from protein data bank) that contains x , y , z coordinates of two proteins called CYS and LYS. the final goal is to calculate the distance between every CYS from every LYS in each file.
I have extracted the coordinates and put then in six separate lists. Now I need to calculate the distance from the xyz coordinates as:
dist = math.sqrt((xc - xl)**2 (yc - yl)**2 (zc - zl)**2)
But I don't know how to loop over the six lists to calculate the distances between CYS and LYS in each file.
here is how the contents of a file looks like (just copied a part that contains LYS from a file as an example):
ATOM 43 CA LYS A 7 106.336 41.686 -11.244 1.00 21.93 C
ATOM 44 C LYS A 7 106.561 41.901 -12.727 1.00 21.10 C
ATOM 45 O LYS A 7 106.327 43.032 -13.214 1.00 24.85 O
ATOM 46 CB LYS A 7 107.553 41.913 -10.402 1.00 24.26 C
ATOM 47 CG LYS A 7 107.550 41.181 -9.058 1.00 33.89 C
ATOM 48 CD LYS A 7 108.522 41.766 -8.051 1.00 35.19 C
ATOM 49 CE LYS A 7 109.455 40.737 -7.453 1.00 58.09 C
ATOM 50 NZ LYS A 7 110.799 40.722 -8.120 1.00 55.93 N
ATOM 51 N THR A 8 106.979 40.859 -13.401 1.00 19.73 N
ATOM 52 CA THR A 8 107.196 40.777 -14.860 1.00 21.18 C
ATOM 53 C THR A 8 105.925 41.136 -15.620 1.00 21.07 C
ATOM 54 O THR A 8 105.925 42.020 -16.497 1.00 14.72 O
Here is my code:
BaseDir=os.getcwd()
all_files = np.sort(glob('*[0-600]*.ent'))
for filename in all_files:
Xc = [] # X coordinate of CYS
Yc = []
Zc = []
Xl = [] # X coordinate of LYS
Yl = []
Zl = []
f = open(filename)
Lines = f.readlines()
for i in range(1, len(Lines)):
if 'CA CYS' in Lines[i]:
linec = Lines[i].split()
if 'CA CYS' in Lines[i] and linec[0]=='ATOM':
xc, yc, zc = linec[6] , linec[7], linec[8]
Xc.append(xc)
Yc.append(yc)
Zc.append(zc)
if 'CA LYS' in Lines[i]:
linel = Lines[i].split()
if 'CA LYS' in Lines[i] and linel[0]=='ATOM':
xl, yl, zl = linel[6] , linel[7], linel[8]
Xl.append(xl)
Yl.append(yl)
Zl.append(zl)
dist = math.sqrt((xc - xl)**2 (yc - yl)**2 (zc - zl)**2)
When I print(Xc, filename) it returns:
['87.372', '73.504', '86.059', '82.490', '74.176', '80.312'] 1.ent
['22.872', '13.708'] 2.ent
[] 3.ent
['62.740', '33.741', '18.064', '46.480', '36.255', '63.534', '49.543', '22.826'] 4.ent
['23.404', '-2.617', '50.714', '11.544', '38.216', '-17.818', '-7.237', '21.019', '-19.612', '37.235', '8.371', '51.634'] 5.ent
['66.407', '63.032', '60.134', '14.158', '17.494', '20.312'] 6.ent
And when I print (Xl, filename):
['106.336', '105.826', '101.645', '81.196', '90.656', '96.290', '97.616', '93.983'] 1.ent
['4.430', '5.438', '19.787', '14.569', '23.059', '22.801', '16.723', '15.916'] 2.ent
['22.609', '32.122', '43.387', '41.576', '41.878', '38.004', '33.163', '38.948', '30.836', '23.899'] 3.ent
['21.847', '11.694', '10.507', '11.545', '11.775', '19.945', '27.931', '37.720', '46.445', '32.629', '30.896', '20.769', '16.377', '9.590', '15.170', '14.925', '47.464', '41.800', '24.277', '51.964', '36.706', '30.401', '25.410', '30.474', '50.309', '49.434', '40.009', '44.067', '43.220', '47.551', '52.487', '48.386', '40.121', '37.329', '21.309', '29.918', '35.721', '16.986', '14.680', '11.808', '11.466', '12.679', '17.290', '27.441', '27.388', '16.853', '52.991', '63.359', '67.769', '73.203', '68.424', '71.665', '34.917', '43.296', '60.160', '34.711', '50.052', '56.439', '60.780', '55.977', '37.295', '37.875', '47.683', '44.875', '42.006', '37.175', '32.072', '39.541', '48.253', '49.848', '65.227', '57.237', '48.009', '67.401', '70.352', '73.582', '74.629', '73.458', '70.474', '61.632', '60.699', '68.440'] 4.ent
['-0.840', '32.630', '27.111', '5.772', '0.552', '5.795', '27.208', '25.416', '24.445', '15.503', '33.113', '19.430', '17.972', '22.147', '27.065', '16.759', '12.083', '-3.498', '10.533', '-10.681', '-8.709', '2.418', '-7.800', '-22.468', '-19.818', '-22.713', '-19.877', '-10.223', '-12.596', '-21.356', '1.043', '-4.927', '-21.858', '-21.388', '-15.276', '3.474', '1.652', '-0.966', '-8.278', '23.326', '-1.463', '9.358', '13.785', '18.642', '7.074', '1.475', '-6.532', '-3.374', '-14.994', '2.388', '18.468', '-1.254', '55.980'] 5.ent
['67.045', '49.407', '52.772', '52.214', '55.680', '55.832', '78.610', '67.134', '79.549', '80.258', '80.339', '74.666', '73.443', '65.523', '67.405', '70.133', '66.798', '61.540', '49.690', '49.952', '50.093', '43.900', '49.549', '45.703', '39.861', '54.826', '59.250', '66.840', '43.908', '37.976'] 6.ent
CodePudding user response:
Here's a start:
import numpy as np
from scipy.spatial.distance import cdist
cys_coords = np.loadtxt("cys_data.txt", usecols=(6, 7, 8))
lys_coords = np.loadtxt("lys_data.txt", usecols=(6, 7, 8)) # assuming the same format
distances = cdist(cys_coords, lys_coords)
You can modify this to maybe loop through a list of filepath strings to read in your data. If you know a priori how many data points you have, you can pre-allocate numpy arrays for your CYS and LYS data.