Home > Software engineering >  Extracting link from soup python
Extracting link from soup python

Time:11-30

I'm trying make an app gets the source links on bandcamp but im kinda stuck. Is there a way to get the source link with beautifulsoup.

The link im trying to get

Bandcamp

CodePudding user response:

The data is within the <script> tags in json format. So use BeautifulSoup to get the 'script'. The data you are after is in the data-tralbum attribute.

Onece you get thet, have json read it in, then just iterate through the json structure:

from bs4 import BeautifulSoup
import requests
import json

url = 'https://vine.bandcamp.com/album/another-light'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
script = str(soup.find_all('script')[4]['data-tralbum'])

jsonData = json.loads(script)
trackinfo = jsonData['trackinfo']

links = []
for each in trackinfo:
    links.append(each['file']['mp3-128'])

Output:

print(links)
['https://t4.bcbits.com/stream/efbba461835eff472bd04a2f9e9910a9/mp3-128/1761020287?p=0&ts=1638288735&t=8ae6343808036ab513cd5436ea009e5d0de784e4&token=1638288735_9139d56ec86f2d44b83a11f3eed8caf7075d6039', 'https://t4.bcbits.com/stream/3e5ef92e6d83e853958ed01955c95f5f/mp3-128/1256475880?p=0&ts=1638288735&t=745a6c701cf1c5772489da5467f6cae5d3622818&token=1638288735_7e86a32c635ba92e0b8320ef56a457d988286cff', 'https://t4.bcbits.com/stream/bbb49d4a72cb80feaf759ec7890abbb6/mp3-128/3439518541?p=0&ts=1638288735&t=dcc7ef7d1d7823e227339fb3243385089478ebe7&token=1638288735_5db36a29c58ea038828d7b34b67e13bd80597dd8', 'https://t4.bcbits.com/stream/8c8a69959337f6f4809f6491c2822b45/mp3-128/1330130896?p=0&ts=1638288735&t=d108dac84dfaac901a546c5fcf5064240cca376b&token=1638288735_8d9151aa82e7a00042025f924660dd3a093c2f74', 'https://t4.bcbits.com/stream/4d4253633405f204d7b1c101379a73be/mp3-128/2478242466?p=0&ts=1638288735&t=a8cd539d0ce8ff417f9b69740070870ed9a182a5&token=1638288735_ad8b5e93c8ffef6623615ce82a6754678fa67b67', 'https://t4.bcbits.com/stream/6c4feee38e289aea76080e9ddc997fa5/mp3-128/2243532902?p=0&ts=1638288735&t=83417c3aba0cef0f969f93bac5165e582f24a588&token=1638288735_c1d9d43b4e10cc6d02c822de90eda3a52c382df2', 'https://t4.bcbits.com/stream/a24dc5dad7b619d47b006e26084ff38f/mp3-128/3054008347?p=0&ts=1638288735&t=4563c326a272c9f5b8462fef1d082e46fac7f605&token=1638288735_55978e7edbe0410ff745913224b8740becad59d5', 'https://t4.bcbits.com/stream/6221790d7f55d3b1f006bd5fac5458fe/mp3-128/1500140939?p=0&ts=1638288735&t=9ecc210c53af05f4034ee00cd1a96a043312a4a7&token=1638288735_0f2faba41da8952f841669513d04bdaaae35a629', 'https://t4.bcbits.com/stream/030506909569626a0d2d7d182b61c691/mp3-128/1707615013?p=0&ts=1638288735&t=c8dcbb2c491789928f5cb6ef8b755df999cb58b8&token=1638288735_b278ba825129ae1b5588b47d5cda345ef2db4e58', 'https://t4.bcbits.com/stream/d1ae0cbc281fc81ddd91f3a3e3d80973/mp3-128/2808772965?p=0&ts=1638288735&t=1080ff51fc40bb5b7afb3a2460f3209cbda549e3&token=1638288735_c93249c847acba5cf23521fa745e05b426a5ba05', 'https://t4.bcbits.com/stream/1b9d50f8210bdc3cf4d2e33986f319ae/mp3-128/2751220220?p=0&ts=1638288735&t=9f24f06dfc5c8a06f24f28664438a6f1a75a038c&token=1638288735_f3a98a20b3c344dc5a37a602a41572d5fe8539c1', 'https://t4.bcbits.com/stream/203cd15629ba03e3249f850d5e1ac42e/mp3-128/4188265472?p=0&ts=1638288735&t=4b4bc2f2194c63a1d3b957e3dd6046bd764c272a&token=1638288735_53a70e7d83ce8c2800baeaf92a5c19db4e146e3f', 'https://t4.bcbits.com/stream/c63b5c9ca090b233e675974c7e7ee4b2/mp3-128/258670123?p=0&ts=1638288735&t=a81ae9dc33dea2b2660d13dbbec93dbcb06e6b63&token=1638288735_446d0ae442cbbadbceb342fe4f7b69d0fbab2928', 'https://t4.bcbits.com/stream/2e824d3c643658c8e9e24b548bc8cb0b/mp3-128/2332945345?p=0&ts=1638288735&t=5bdf0264b9ffe4616d920c55f5081744bf0822d4&token=1638288735_872191bb67a3438ef0fd1ce7e8a9e5ca09e6c37e']
  • Related