I'm trying to make a list of tuples, the first element being the download URL and the second being the file name from the URL string with below code:
import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
full_download_url = [tuple(download_url,i["href"]) for i in table_data.find_all('a')]
But I've been getting TypeError: must be str, not list
all along and I'm not sure how to fix this, please help? Thanks!
CodePudding user response:
This is what I needed:
import urllib
import requests
from bs4 import BeautifulSoup
import pandas as pd
import io
url = r"https://www.ers.usda.gov/data-products/livestock-meat-domestic-data"
my_bytes = urllib.request.urlopen(url)
my_bytes = my_bytes.read().decode("utf8")
parsed_html = BeautifulSoup(my_bytes, features = "lxml")
table_data = parsed_html.body.find('table', attrs = {'id':'data_table'})
download_url = "https://www.ers.usda.gov"
def convertTuple(tup):
str = ''
for item in tup:
str = str item
return str
full_download_url = [convertTuple(tuple(download_url i["href"])) for i in table_data.find_all('a')]
Thanks to Geeks for geeks and everyone trying to help :)
CodePudding user response:
You are incorrectly accessing the download_url
array index.
Python is interpreting your code as creating an array with one element [0]
when i
is
0 for example, and then trying to access the element ["href"]
which is a string, not a valid index
If you specify download_url
before accessing the indices it will work as expected
full_download_url = [(download_url, download_url[i]["href"]) for i in table_data.find_all('a')]