Home > Blockchain >  In Python with BeautifulSoup, how do I get the url link but not the href from a soup?
In Python with BeautifulSoup, how do I get the url link but not the href from a soup?

Time:07-04

Using the following code below:

prop_img = prop_lst.find_all('a',{'class':'mpi_img_link'})

I get a list of the following output:

[<a  href="/homedetail/5324-palm-royale-blvd-sugar-land-tx-77479/2504873" style="background-image:url(https://photos.harstatic.com/205383563/lr/img-1.jpeg?ts=2022-03-02T17:03:25.300);"></a>,
 <a  href="/homedetail/27-riverstone-island-dr-sugar-land-tx-77479/13157541" style="background-image:url(https://photos.harstatic.com/184277385/lr/img-1.jpeg?ts=2020-04-07T14:50:50.960);"></a>,
 <a  href="/homedetail/23-beacon-hl-sugar-land-tx-77479/2507526" style="background-image:url(https://photos.harstatic.com/205977706/lr/img-1.jpeg?ts=2022-03-20T23:10:23.777);"></a>,
 <a  href="/homedetail/0-hagerson-rd-sugar-land-tx-77479/2375356" style="background-image:url(https://photos.harstatic.com/205015725/lr/img-1.jpeg?ts=2022-02-16T10:20:54.847);"></a>,
 <a  href="/homedetail/5-cypress-valley-ct-sugar-land-tx-77479/2505565" style="background-image:url(https://photos.harstatic.com/208809599/lr/img-1.jpeg?ts=2022-06-27T14:34:55.917);"></a>,
 <a  href="/homedetail/21-grand-mnr-sugar-land-tx-77479/2506201" style="background-image:url(https://photos.harstatic.com/201552628/lr/img-1.jpeg?ts=2021-10-21T10:55:15.270);"></a>,
 <a  href="/homedetail/427-w-alkire-lake-dr-sugar-land-tx-77478/10240223" style="background-image:url(https://photos.harstatic.com/203290759/lr/img-1.jpeg?ts=2022-01-02T15:48:57.463);"></a>,
 <a  href="/homedetail/1309-n-horseshoe-dr-sugar-land-tx-77478/2390056" style="background-image:url(https://photos.harstatic.com/209561396/lr/img-1.jpeg?ts=2022-06-27T21:04:42.547);"></a>,
 <a  href="/homedetail/1217-n-horseshoe-dr-sugar-land-tx-77478/10101841" style="background-image:url(https://photos.harstatic.com/207957668/lr/img-1.jpeg?ts=2022-06-12T19:32:34.500);"></a>,
 <a  href="/homedetail/1990-hagerson-rd-sugar-land-tx-77479/15860752" style="background-image:url(https://photos.harstatic.com/208557478/lr/img-1.jpeg?ts=2022-06-02T16:14:04.770);"></a>,
 <a  href="/homedetail/15202-old-richmond-rd-sugar-land-tx-77498/2387668" style="background-image:url(https://photos.harstatic.com/194038254/lr/img-1.jpeg?ts=2021-03-03T18:51:20.263);"></a>,
 <a  href="/homedetail/323-w-alkire-lake-dr-sugar-land-tx-77478/2390859" style="background-image:url(https://photos.harstatic.com/206236194/lr/img-1.jpeg?ts=2022-03-29T10:03:14.540);"></a>]

This is great! Now I seek to get the 'style="background-image:url"' url, that is the following ouputs. I do NOT want the href links.

https://photos.harstatic.com/205383563/lr/img-1.jpeg?ts=2022-03-02T17:03:25.300
https://photos.harstatic.com/184277385/lr/img-1.jpeg?ts=2020-04-07T14:50:50.960
:
:
:
https://photos.harstatic.com/206236194/lr/img-1.jpeg?ts=2022-03-29T10:03:14.540

I think you are supposed to use the CSS selector like below, but I am still not able to achieve the end goal. Can anyone help with this please? Thank you!

for img in prop_img:
    print(prop_lst.select('style'))

CodePudding user response:

Almost there. Try:

for img in prop_img:
    print(img['style'])

To pull out the url part:

import re

for img in prop_img:
    url = re.search('\((.*)\)', img['style']).group(1)  
    print(url)
  • Related