Home > Software engineering >  How to get something from a webpage with python
How to get something from a webpage with python

Time:12-31

This page has a url https://www.example.com

<html>
<body>
<button id="button1" onclick=func1()>
<button id="button2" onclick=func2()>
</body>
<script>
function func1(){
  open("/doubt?s=AAAB_BCCCDD");
}

function func2(){
  open("/doubt?s=AABB_CCDDEE");
}
//something like that, it is working ....
</script>
</html>

AAAB_BCCCDD and AABB_CCDDEE - both are the tokens ...

i want to get the first token in the page with python
my python code -

import requests

r = requests.get("https://www.example.com")
s = r.text

if "/doubt?s=" in s:
# After this i can' understand anything ...
# i want to get the first token here as a variable

please help me ....

CodePudding user response:

Usually, after fetching the website's raw text content, you would parse the HTML first using a library like BeautifulSoup. It will create a document object model (DOM) tree, which you then can query for the elements you need.

However, this won't read nor interpret JavaScript code. For your problem, you can use regular expressions to extract the necessary information from the raw text.

Example:

import re
import requests

r = requests.get("https://www.example.com")
s = r.text

pattern = re.compile('/doubt\\?s=(?P<token>\\w )')
matches = pattern.findall(s)
if len(matches) > 0:
  print(matches[0])
  • Related