Home > database >  why I couldn't get the search keywords in the alphaFold Protein Structure Database using Beauti
why I couldn't get the search keywords in the alphaFold Protein Structure Database using Beauti

Time:11-12

I've been trying to scrape the search result of the AlphaFold Protein Structure Database and couldn't find the desired information in the scraping result. So my idea is that, e.g., if I put the search key word "Alpha-elapitoxin-Oh2b" in the search bar and click the search button, it will generate a new page with the URL: https://alphafold.ebi.ac.uk/search/text/Alpha-elapitoxin-Oh2b In google chrome, I used "inspect" to check the code for this page and found my desired search result, i.e. the I.D. for this protein: P82662. However, when I used requests and bs4 to scrape this page. I couldn't find the desired "P82662" in the returned information, also not even the search words "Alpha-elapitoxin-Oh2b"

import requests
from bs4 import BeautifulSoup
response = requests.get('https://alphafold.ebi.ac.uk/search/text/Alpha-elapitoxin-Oh2b')
html = response.text
soup = BeautifulSoup(html, "html.parser")
print(soup.prettify())

I searched StackOverflow and tried to find a solution of not being able to find the result with BS4 and requests and found someone said that it is because the page of the search result was wrapped with JavaScript. So is it true? How can I solve this problem?

Thanks!

CodePudding user response:

The desired search data is loaded dynamically from external source via API as json format as get method. So bs4 getting empty ResultSet.

import requests

res= requests.get('https://alphafold.ebi.ac.uk/api/search?q=(text:*Alpha\-elapitoxin\-Oh2b OR text:Alpha\-elapitoxin\-Oh2b*)&type=main&start=0&rows=20')
    
for item in res.json()['docs']:
    id_num =item['uniprotAccession']
    print(id_num)

Output:

P82662

  • Related