I have a program that gets links from a HTML source file and I want to make it save all the links in a variable and then print only the 4th link. How would I do that?
The code:
soup = BeautifulSoup(z["body"], features="lxml")
for tag in soup.find_all("a"):
links = tag.get("href")
CodePudding user response:
After you've heavily edited the post and changed the question completely... IIUC:
links = [tag.get('href') for tag in soup.find_all("a")]
print(links[3])
Use str.splitlines()
~
Given:
text = """user1:hwid
user2:hwid
user3:hwid
user4:hwid
user5:hwid"""
Doing:
print(text.splitlines()[3])
Output:
user4:hwid
CodePudding user response:
An alternative in a specific case would be to select your element directly - You could use css selectors
more specific a pseudo class :nth-of-type()
that matches all elements of a given type, based on their position among a group of siblings:
links = soup.select_one('a:nth-of-type(4)').get("href")
Example
from bs4 import BeautifulSoup
html = '''
<h2><span>Title1</span></h2>
<p>some text</p>
<a href="1">link 1</a>
<h2><span>Title2</span></h2>
<a href="2">link 2</a>
<p>some text</p>
<a href="3">link 3</a>
<a href="4">link 4</a>
'''
soup = BeautifulSoup(html)
soup.select_one('a:nth-of-type(4)').get("href")