import requests
import re
from bs4 import BeautifulSoup
respond = requests.get("http://www.kulugyminiszterium.hu/dtwebe/Irodak.aspx")
print(respond)
soup = BeautifulSoup(respond.text, 'html.parser')
for link in soup.find_all('a'):
links = link.get('href')
linki_bloc = ('http://www.kulugyminiszterium.hu/dtwebe/' links).replace(' ', ' ' )
print(linki_bloc)
value = linki_bloc
print(value.split())
I am trying to use the results of find_all('a')
as a list
. The only thing that succeeds for me is the last link.
It seems to me that the problem is the results as a list of links deselected \n
. I tried many ways to get rid of the new line character but failed. Saving to a file (e.g. .txt) also fails, saving only the last link.
CodePudding user response:
Close to your goal, but you overwrite the result wit each iteration - Simply append your manipulated links to a list
with list comprehension
directly:
['http://www.kulugyminiszterium.hu/dtwebe/' link.get('href').replace(' ', ' ' ) for link in soup.find_all('a')]
or as in your example:
links = []
for link in soup.find_all('a'):
links.append('http://www.kulugyminiszterium.hu/dtwebe/' link.get('href').replace(' ', ' ' ))
Example
import requests
from bs4 import BeautifulSoup
respond = requests.get("http://www.kulugyminiszterium.hu/dtwebe/Irodak.aspx")
soup = BeautifulSoup(respond.text, 'html.parser')
links = []
for link in soup.find_all('a'):
links.append('http://www.kulugyminiszterium.hu/dtwebe/' link.get('href').replace(' ', ' ' ))
links
CodePudding user response:
Assuming you're just trying to get a list of HREFS then:
import requests
from bs4 import BeautifulSoup as BS
from urllib.parse import urljoin
BASE = 'http://www.kulugyminiszterium.hu/dtwebe/'
(r := requests.get(urljoin(BASE, 'Irodak.aspx'))).raise_for_status()
soup = BS(r.text, 'lxml')
hrefs = []
for a in soup.find_all('a'):
hrefs.append(urljoin(BASE, a['href']).replace(' ', ' '))
print(*hrefs, sep='\n')
(partial) Output:
http://www.kulugyminiszterium.hu/dtwebe/reszletes.aspx?Orszag=Barbados
http://www.kulugyminiszterium.hu/dtwebe/reszletes.aspx?Orszag=Bolivarian Republic of Venezuela
http://www.kulugyminiszterium.hu/dtwebe/reszletes.aspx?Orszag=Bosnia and Herzegovina
http://www.kulugyminiszterium.hu/dtwebe/reszletes.aspx?Orszag=Canada
http://www.kulugyminiszterium.hu/dtwebe/reszletes.aspx?Orszag=Commonwealth of Australia