Home > Software design >  How can i get all the content on a website
How can i get all the content on a website

Time:10-12

i would like to do a webscrapping

so i do a simple request:

import urllib.request

fp = urllib.request.urlopen("https://www.iadfrance.fr/trouver-un-conseiller")
mybytes = fp.read()

mystr = mybytes.decode("utf8")

faa = open("demofile2.txt", "a")
faa.write(mystr)
faa.close()


fp.close()

but enter image description here

i don't find any name in my file;

Why? and there is a way to get all the performers on the map?

Thanks for your answers!

CodePudding user response:

The fundamental concept here has a name, "HATEOAS", Hypermedia as the Engine of Application State.

The first response that you get contains the next list of resources that you need to ask. In turn, they may contain quite a few more. Some of those resources might be Javascript, which when executed requests even more data. That's inconvenient and a violation of the theoretical HATEOAS model, but it is very much the practice for interactive websites.

CodePudding user response:

Here is how you get the data

import requests

r =  requests.get('https://www.iadfrance.fr/agent-search-location?southwestlat=48.8251752&southwestlng=2.2935677&northeastlat=48.8816507&northeastlng=2.4039459')
if r.status_code == 200:
  print(r.json())
else:
  print(f'Oops. Status code is {r.status_code}')
  • Related