Home > database >  How do I get href value instead of the text from a result set?
How do I get href value instead of the text from a result set?

Time:10-15

I'm using

print(d.contents)

My print loop prints the following ResultSets:

[<a href="Property.aspx?pi=cd1a0b90-07aa-ec11-aa4c-246e960cbc4d" title="3 Bedroom House For Sale In Anthoupoli, Nicosia">3 Bedroom House For Sale In Anthoupoli, Nicosia</a>]
[<a href="Property.aspx?pi=c42dc379-5f67-e811-bb4e-a4badb3ceace" title="2 Storey Modern 3 Bedroom House For Sale In Dali Area">2 Storey Modern 3 Bedroom House For Sale In Dali Area</a>]
[<a href="Property.aspx?pi=6370763d-61ab-e811-b319-a4badb3ceacd" title="Very Nice And Spacious 3 Bedroom Detached House For Sale In Lakatamia">Very Nice And Spacious 3 Bedroom Detached House For Sale In Lakatamia</a>]
[<a href="Property.aspx?pi=7da50193-266b-e811-bb4e-a4badb3ceace" title="3 Bedroom Under Construction Detached 4 Houses For Sale In Tseri Area">3 Bedroom Under Construction Detached 4 Houses For Sale In Tseri Area</a>]
[<a href="Property.aspx?pi=96d0fb0a-89bd-ec11-aa4e-246e960cbc4d" title="3 Bedroom House For Sale In Agios Dometios, Nicosia">3 Bedroom House For Sale In Agios Dometios, Nicosia</a>]
[<a href="Property.aspx?pi=1881b78e-52d7-ec11-aa4e-246e960cbc4d" title="4 Bedroom House For Sale In Archangelos, Nicosia">4 Bedroom House For Sale In Archangelos, Nicosia</a>]
[<a href="Property.aspx?pi=eaa2b630-e685-ec11-aa4c-246e960cbc4d" title="In Excellent Location 3 Bedrooms House In Archangelos Nicosia">In Excellent Location 3 Bedrooms House In Archangelos Nicosia</a>]
[<a href="Property.aspx?pi=2ad40da2-a190-ec11-aa4c-246e960cbc4d" title="Incomplete residential development in Politiko, Nicosia">Incomplete residential development in Politiko, Nicosia</a>]
[<a href="Property.aspx?pi=a19ad42e-cd59-e911-8b16-a4badb3ceacd" title="3 Bedroom House For Sale In Spilia With Great View">3 Bedroom House For Sale In Spilia With Great View</a>]

How can I print only the values of href attribute?

I noticed that using print(d.text) gives me only the titles, but I want the URLs instead.

CodePudding user response:

Instead of d.contents that will always create a ResultSet of elements children, select the one and only <a> in your element directly and extract its href:

d.a.get('href')

or

d.find('a').get('href')

In addition you could also select your elements more specific

for e in soup.select('#properties h3 a'):
    print(e.get('href'))

Example

For your next question try to create an minimal reproducible example like this, to make it easier for others to understand the situation and your issue.

from bs4 import BeautifulSoup

html = '''
<div id="properties">

<div >
    <div >
        <h3><a href="Property.aspx?pi=92700a36-fd11-ec11-aa4a-246e960cbc4d" title="3 Bedroom House For Sale In Akaki">3 Bedroom House For Sale In Akaki</a></h3>
    </div>
</div>

<div >
    <div >
        <h3><a href="Property.aspx?pi=dc3f03fe-6140-ea11-a1df-a4badb3ceacd" title="Shop For Sale in Nicosia Center">Shop For Sale in Nicosia Center</a></h3>
    </div>
</div>

<div >
    <div >
        <h3><a href="Property.aspx?pi=7f72737d-72e2-ec11-aa4e-246e960cbc4d" title="1 Bedroom Apartment For Sale In Strovolos, Nicosia">1 Bedroom Apartment For Sale In Strovolos, Nicosia</a></h3>
    </div>
</div>

</div>
'''
soup = BeautifulSoup(html, 'html.parser')

for e in soup.select('#properties h3 a'):
    print(e.get('href'))

Output

Property.aspx?pi=92700a36-fd11-ec11-aa4a-246e960cbc4d
Property.aspx?pi=dc3f03fe-6140-ea11-a1df-a4badb3ceacd
Property.aspx?pi=7f72737d-72e2-ec11-aa4e-246e960cbc4d
  • Related