Home > Enterprise >  create list of tuples with download url "href"
create list of tuples with download url "href"

Time:04-06

I'm trying to make a list of tuples, the first element being the download URL and the second being the file name from the URL string with below code:

import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
full_download_url = [tuple(download_url,i["href"]) for i in table_data.find_all('a')]

But I've been getting TypeError: must be str, not list all along and I'm not sure how to fix this, please help? Thanks!

CodePudding user response:

This is what I needed:

import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
def convertTuple(tup):
    str = ''
    for item in tup:
        str = str   item
    return str
full_download_url = [convertTuple(tuple(download_url   i["href"])) for i in table_data.find_all('a')]

Thanks to Geeks for geeks and everyone trying to help :)

CodePudding user response:

You are incorrectly accessing the download_url array index.

Python is interpreting your code as creating an array with one element [0] when i is 0 for example, and then trying to access the element ["href"] which is a string, not a valid index

If you specify download_url before accessing the indices it will work as expected

full_download_url = [(download_url, download_url[i]["href"]) for i in table_data.find_all('a')]
  • Related