here I have a list of URL and I am trying to get "td" from all of them, but am only able to get the last URL's HTML.
import numpy as np
import pandas as pd
from datetime import datetime
import pytz
import requests
import json
from bs4 import BeautifulSoup
url_list = ['https://www.coingecko.com/en/coins/ethereum/historical_data/usd?start_date=2021-08-06&end_date=2021-09-05#panel',
'https://www.coingecko.com/en/coins/cardano/historical_data/usd?start_date=2021-08-06&end_date=2021-09-05#panel',
'https://www.coingecko.com/en/coins/chainlink/historical_data/usd?start_date=2021-08-06&end_date=2021-09-05#panel']
for link in range(len(url_list)):
response = requests.get(url_list[link])
src = response.content
soup = BeautifulSoup(response.text , 'html.parser')
res1 = soup.find_all( "td", class_ = "text-center")
res1
could anyone please help me how to get data of all URLs ?
CodePudding user response:
You are overwriting your soup variable through each iteration the loop. So instead of saving all the results from each url and then looping over those, you are only going to get the final result.
- Create a variable before the loop to store the results of each iteration
- Append the soup to that new variable each iteration
- create a new loop to interact with your stored data
and you can access each element in a list with:
for url in url_list:
response = requests.get(url)
# rest of code
Easier to read
So
# empty list to store all results
results = []
# your loop here
for u in url_list:
response = requests.get(url)
src = response.content
soup = BeautifulSoup(response.text , 'html.parser')
results.append(soup.find_all( "td", class_ = "text-center"))
# Accessing the data from the results
for result in results:
print(result)