Home > Software design >  Extracting data from an html response in python
Extracting data from an html response in python

Time:10-04

as a response of a request, i'm getting a full 1600 lines html document.

what i'm trying to do is find a way to extract a value from a specific line:

    <input type="hidden" id="form__token" name="form[_token]" data-parsley-errors-container="#form__token_error" value="tHV9QvBk9HEvZSP8S8bCkpC1vsSE4B4HthgXgk4V7FM" /></form>

at line 1594 of my document, i'm trying to get the value of value. What i thought of doing was to do extract the tag value and its value to then delete everything that was not that but the tag does appear elsewhere in my html file so there is no point.

any ideas on how i could make this work ? thank you for your help and time !

CodePudding user response:

You will need 'request' and 'BeautifulSoup' for getting the data you want from the said url

Try:

from bs4 import BeautifulSoup
import requests

url = ('link to url')

page = requests.get(url, timeout = 5) #timeout only if required
soup = BeautifulSoup(page.text, 'html.parser') 

value = soup.find(id='form__token')

print(value)

CodePudding user response:

The id should be unique in your whole HTML code if it has been written correctly. This means that in theory, searching for the id instead of the value will give you the input tag you are searching for.

Have a nice day,

  • Related