Home > Blockchain >  Webscraping tables
Webscraping tables

Time:10-07

I keep pulling annual revenue table when Im meant to be pulling quarterly. Please may someone explaing what I am doing wrong? (Code Below)

url='https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue'

html_data=requests.get(url).text

soup=BeautifulSoup(html_data, 'html.parser')

tesla_revenue=pd.DataFrame(columns=['Date', 'Revenue'])
for row in soup.find('tbody').find_all('tr'):
    col=row.find_all('td')
    date=col[0]
    revenue=col[1]
    tesla_revenue=tesla_revenue.append({'Date':date,'Revenue':revenue}, ignore_index=True)

tesla_revenue.head()

CodePudding user response:

The quarterly data is located in second table ([1]):

url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"

html_data = requests.get(url).text

soup = BeautifulSoup(html_data, "html.parser")

table = soup.select("table")[1]

all_data = []
for row in table.find("tbody").find_all("tr"):
    col = row.find_all("td")
    date = col[0].text
    revenue = col[1].text
    all_data.append({"Date": date, "Revenue": revenue})

tesla_revenue = pd.DataFrame(all_data)
print(tesla_revenue.head())

Prints:

         Date  Revenue
0  2021-06-30  $11,958
1  2021-03-31  $10,389
2  2020-12-31  $10,744
3  2020-09-30   $8,771
4  2020-06-30   $6,036

CodePudding user response:

I think you should extract text. You wrote in your code like this.

date=col[0]
revenue=col[1]

You should change it like following.

date=col[0].text.strip()
revenue=col[1].text.strip()
  • Related