Home > other >  python3 webscraping- loop returns only one iteration
python3 webscraping- loop returns only one iteration

Time:12-10

python3 web scraping) I'm trying to extract table from html data and store it into a new dataframe. I need all the 'td' values but when I try to iterate, the loop only returns the first line, not the all lines. Below is my code and output

!pip install yfinance
!pip install pandas
!pip install requests
!pip install bs4
!pip install plotly

import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

def make_graph(stock_data, revenue_data, stock):
 fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
 stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
 revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
 fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
 fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
 fig.update_xaxes(title_text="Date", row=1, col=1)
 fig.update_xaxes(title_text="Date", row=2, col=1)
 fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
 fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
 fig.update_layout(showlegend=False,
 height=900,
 title=stock,
 xaxis_rangeslider_visible=True)
 fig.show()

tsla = yf.Ticker("TSLA")
tsla

tesla_data = tsla.history(period="max")
tesla_data


tesla_data.reset_index(inplace=True)
tesla_data.head()

url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"
html_data  = requests.get(url).text


soup = BeautifulSoup(html_data, 'html.parser')

tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'): 
 col = row.find_all("td")
 date = col[0].text
 revenue = col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue



DATE Revenue
0 2008 15$

CodePudding user response:

What happens?

It works fine but you are appending the data outside of your loop, so you always get the result of your last iteration.

How to fix?

Fix your indentation and put the appending part into your loop

tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'): 
    col = row.find_all("td")
    date = col[0].text
    revenue = col[1].text
    tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue

Example

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"
html_data  = requests.get(url).text

soup = BeautifulSoup(html_data, 'html.parser')

tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'): 
    col = row.find_all("td")
    date = col[0].text
    revenue = col[1].text
    tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue

Output

Date Revenue
0 2020 $31,536
1 2019 $24,578
2 2018 $21,461
3 2017 $11,759
4 2016 $7,000
5 2015 $4,046
6 2014 $3,198
... ... ...

CodePudding user response:

Find main table using appropriate class and tag

res=requests.get("https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue")

soup=BeautifulSoup(res.text,"html.parser")
teable=soup.find("table",class_="historical_data_table table")
main_data=table.find_all("tr")     

Now append data to list and create list of list data for creaing row data for DataFrame

main_lst=[]
for i in main_data[1:]:
    lst=[data.get_text(strip=True) for data in i.find_all("td")]
    main_lst.append(lst)

Now use that data to show as df

import pandas as pd
df=pd.DataFrame(columns=["Date","Price"],data=main_lst)
df

Output:

    Date    Price
0   2020    $31,536
1   2019    $24,578
2   2018    $21,461
3   2017    $11,759
...

In one liner using pandas

df=pd.read_html("https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue")
print(len(df))
print(df[0])

Output

6

    Date    Price
0   2020    $31,536
1   2019    $24,578
2   2018    $21,461
3   2017    $11,759

...

  • Related