I have a list of peptide sequence, I want to map it to the correct protein names from any Open Database like Uniprot, i.e., peptides belonging to the proteins. Can someone guide how to find the protein names and map them, thanks in advance.
CodePudding user response:
I'd say your best bet is to use the requests module and hook into the API that Uniprot has on their website. The API for peptide sequence searching is here, and the docs for it link from the same page. With this, you should be able to form a dict that contains your search parameters and send a request to the API that will return the results you are looking for. The requests module allows you to retrieve the results as json format, which you can very easily parse back into lists/dicts, etc for use in whatever way you wish.
Edit: I have code!
Just for fun, I tried the first part: looking up the proteins using the peptides. This works! You can see how easy the requests module makes this sort of thing :)
There is another API for retrieving the database entries once you have the list of "accessions" from this first step. All of the API end points and docs can be accessed here. I think you want this one.
import requests
from time import sleep
url = 'https://research.bioinformatics.udel.edu/peptidematchws/asyncrest'
#peps can be a comma separated list for multiple peptide sequences
data={'peps':'MKTLLLTLVVVTIVCLDLGYT','lEQi':'off','spOnly':'off'}
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
response = requests.post(url,params=data,headers=headers)
if response.status_code == 202:
print(f"Search accepted. Results at {response.headers['Location']}")
search_job = requests.get(response.headers['Location'])
while search_job.status_code == 303:
sleep(30)
search_job = requests.get(response.headers['Location'])
if search_job.status_code == 200:
results = search_job.text.split(',')
print('Results found:')
print(results)
else:
print('No matches found')
else:
print('Error Search not accepted')
print(response.status_code, response.reason)