tags = [{tag.name: tag.text.strip()} for tag in soup.find_all('h2')]
This returns as:
[{'h2':'My'},{'h2':'hey'}] # Returns all the h2 elements with their content.
Now I want all the links that are inside the <script src =''>
in the above format.
Suppose, For the HTML code,
<script src="https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.hvE_rrhCzPE.O/m=gapi_iframes,googleapis_client/rt=j/sv=1/d=1/ed=1/rs=AHpOoo-98F2Gk-siNaIBZOtcWfXQWKdTpQ/cb=gapi.loaded_0" nonce="" async=""></script>
The result should be
#Both Acceptable
[{'script':'https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.hvE_rrhCzPE.O/m=gapi_iframes,googleapis_client/rt=j/sv=1/d=1/ed=1/rs=AHpOoo-98F2Gk-siNaIBZOtcWfXQWKdTpQ/cb=gapi.loaded_0'}]
OR
[{'script src':'https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.hvE_rrhCzPE.O/m=gapi_iframes,googleapis_client/rt=j/sv=1/d=1/ed=1/rs=AHpOoo-98F2Gk-siNaIBZOtcWfXQWKdTpQ/cb=gapi.loaded_0'}]
CodePudding user response:
You just need to change the tag you are finding from h2
to script
. Then instead of getting the text of that element with tag.text
, you are wanting the attribute value in the syntax of tag['attribute name']
. So something like:
tags = [{tag.name: tag['src']} for tag in soup.find_all('script')]