Home > OS >  Download content of Webpage
Download content of Webpage

Time:05-09

I need to download the content of a web page using Python.
What I need is the TLE of a specific satellite from Space-Track.org website.
An example of the url I need to scrape is the following:

https://www.space-track.org/basicspacedata/query/class/gp/NORAD_CAT_ID/44235/format/tle/emptyresult/show

Below the unsuccesful code I wrote/copied:

import requests

url = 'https://www.space- 
track.org/basicspacedata/query/class/gp/NORAD_CAT_ID/44235/format/tle/emptyresult/show'
res = requests.post(url)
html_page = res.content

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_page, 'html.parser')
text = soup.find_all(text=True)
print(text)

res.post(url) returns Response [204] and I can't access the content of the webpage.
Could this happen because of the required login?
I must admit that I am not experienced with Python and I don't have the knowledge to this myself.
What I can do is to manipulate text files and from the DevTools page I can get the HTML file and extrapolate the text, but how can I do this programmatically?

CodePudding user response:

To access the url you mentioned , you need USERNAME and PASSWORD Authorization.

to do this( customize to your need):

import mechanize
from bs4 import BeautifulSoup
import urllib2 
import cookielib ## http.cookiejar in python3

cj = cookielib.CookieJar()
br = mechanize.Browser()
br.set_cookiejar(cj)
br.open("https://id.arduino.cc/auth/login/")

br.select_form(nr=0)
br.form['username'] = 'username'
br.form['password'] = 'password.'
br.submit()

print br.response().read()

CodePudding user response:

I don't have access to this API, so take my advice with a grain of salt, but you should also try using requests.get instead of requests.post.

Why? Because requests.post POSTs data to the server, while requests.get GETs data from the server. GET and POST are known as HTTP methods, and to learn more about them, see https://www.tutorialspoint.com/http/http_methods.htm. Since web browsers use GET, you should give that a try.

CodePudding user response:

My bad for not seeing it before, but Space-Track has already a solution on their website:

https://www.space-track.org/documentation#howto-api_python

  • Related