Home > Software design >  How to find the script src link?(Beautiful Soup)
How to find the script src link?(Beautiful Soup)

Time:10-19

tags = [{tag.name: tag.text.strip()} for tag in soup.find_all('h2')]

This returns as:

[{'h2':'My'},{'h2':'hey'}] # Returns all the h2 elements with their content.

Now I want all the links that are inside the <script src =''> in the above format.

Suppose, For the HTML code,

<script src="https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.hvE_rrhCzPE.O/m=gapi_iframes,googleapis_client/rt=j/sv=1/d=1/ed=1/rs=AHpOoo-98F2Gk-siNaIBZOtcWfXQWKdTpQ/cb=gapi.loaded_0" nonce="" async=""></script>

The result should be

#Both Acceptable

[{'script':'https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.hvE_rrhCzPE.O/m=gapi_iframes,googleapis_client/rt=j/sv=1/d=1/ed=1/rs=AHpOoo-98F2Gk-siNaIBZOtcWfXQWKdTpQ/cb=gapi.loaded_0'}]

OR

[{'script src':'https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.hvE_rrhCzPE.O/m=gapi_iframes,googleapis_client/rt=j/sv=1/d=1/ed=1/rs=AHpOoo-98F2Gk-siNaIBZOtcWfXQWKdTpQ/cb=gapi.loaded_0'}]

CodePudding user response:

You just need to change the tag you are finding from h2 to script. Then instead of getting the text of that element with tag.text, you are wanting the attribute value in the syntax of tag['attribute name']. So something like:

tags = [{tag.name: tag['src']} for tag in soup.find_all('script')]
  • Related