Has anyone ever scrapped (e.g. into dataframe
) financial statements available at roic.ai?
The source code of the page is very nested and obtaining the statements is not straightforward:
from gazpacho import get, Soup
ticker = 'aapl'
url = f'https://roic.ai/financials/{ticker}?fs=annual'
print(url)
html = get(url)
soup = Soup(html)
soup.find('div', {'class', "flex-col"})
CodePudding user response:
You can load the Json data from the <script>
inside the page:
import re
import json
import requests
from bs4 import BeautifulSoup
ticker = "aapl"
url = f"https://roic.ai/financials/{ticker}?fs=annual"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").text)
# umcomment this to print all data:
# print(json.dumps(data, indent=4))
# load sample data as pandas DataFrame
df = pd.DataFrame(data["props"]["pageProps"]["data"]["data"]["bsq"])
print(df.head().to_markdown(index=False))
Prints:
date | symbol | reportedCurrency | cik | fillingDate | acceptedDate | calendarYear | period | cashAndCashEquivalents | shortTermInvestments | cashAndShortTermInvestments | netReceivables | inventory | otherCurrentAssets | totalCurrentAssets | propertyPlantEquipmentNet | goodwill | intangibleAssets | goodwillAndIntangibleAssets | longTermInvestments | taxAssets | otherNonCurrentAssets | totalNonCurrentAssets | otherAssets | totalAssets | accountPayables | shortTermDebt | taxPayables | deferredRevenue | otherCurrentLiabilities | totalCurrentLiabilities | longTermDebt | deferredRevenueNonCurrent | deferredTaxLiabilitiesNonCurrent | otherNonCurrentLiabilities | totalNonCurrentLiabilities | otherLiabilities | capitalLeaseObligations | totalLiabilities | preferredStock | commonStock | retainedEarnings | accumulatedOtherComprehensiveIncomeLoss | othertotalStockholdersEquity | totalStockholdersEquity | totalLiabilitiesAndStockholdersEquity | minorityInterest | totalEquity | totalLiabilitiesAndTotalEquity | totalInvestments | totalDebt | netDebt | link | finalLink |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
06/25/2022 | AAPL | USD | 0000320193 | 2022-07-29 | 2022-07-28 18:06:56 | 2022 | Q3 | 27502000000 | 20729000000 | 48231000000 | 42242000000 | 5433000000 | 16386000000 | 112292000000 | 40335000000 | 0 | 0 | 0 | 131077000000 | 0 | 52605000000 | 224017000000 | 0 | 336309000000 | 48343000000 | 24991000000 | 0 | 7728000000 | 48811000000 | 129873000000 | 94700000000 | 0 | 0 | 53629000000 | 148329000000 | 0 | 0 | 278202000000 | 0 | 62115000000 | 5289000000 | -9297000000 | 0 | 58107000000 | 336309000000 | 0 | 58107000000 | 336309000000 | 151806000000 | 119691000000 | 92189000000 | https://www.sec.gov/Archives/edgar/data/320193/000032019322000070/0000320193-22-000070-index.htm | https://www.sec.gov/Archives/edgar/data/320193/000032019322000070/aapl-20220625.htm |
03/26/2022 | AAPL | USD | 0000320193 | 2022-04-29 | 2022-04-28 18:03:58 | 2022 | Q2 | 28098000000 | 23413000000 | 51511000000 | 45400000000 | 5460000000 | 15809000000 | 118180000000 | 39304000000 | 0 | 0 | 0 | 141219000000 | 0 | 51959000000 | 232482000000 | 0 | 350662000000 | 52682000000 | 16658000000 | 0 | 7920000000 | 50248000000 | 127508000000 | 103323000000 | 0 | 0 | 52432000000 | 155755000000 | 0 | 0 | 283263000000 | 0 | 61181000000 | 12712000000 | -6494000000 | 0 | 67399000000 | 350662000000 | 0 | 67399000000 | 350662000000 | 164632000000 | 119981000000 | 91883000000 | https://www.sec.gov/Archives/edgar/data/320193/000032019322000059/0000320193-22-000059-index.htm | https://www.sec.gov/Archives/edgar/data/320193/000032019322000059/aapl-20220326.htm |
12/25/2021 | AAPL | USD | 0000320193 | 2022-01-28 | 2022-01-27 18:00:58 | 2022 | Q1 | 37119000000 | 26794000000 | 63913000000 | 65253000000 | 5876000000 | 18112000000 | 153154000000 | 39245000000 | 0 | 0 | 0 | 138683000000 | 0 | 50109000000 | 228037000000 | 0 | 381191000000 | 74362000000 | 16169000000 | 41241000000 | 7876000000 | 49167000000 | 147574000000 | 106629000000 | 0 | 0 | 55056000000 | 161685000000 | 0 | 0 | 309259000000 | 0 | 58424000000 | 14435000000 | -927000000 | 0 | 71932000000 | 381191000000 | 0 | 71932000000 | 381191000000 | 165477000000 | 122798000000 | 85679000000 | https://www.sec.gov/Archives/edgar/data/320193/000032019322000007/0000320193-22-000007-index.htm | https://www.sec.gov/Archives/edgar/data/320193/000032019322000007/aapl-20211225.htm |
09/25/2021 | AAPL | USD | 0000320193 | 2021-10-29 | 2021-10-28 18:04:28 | 2021 | Q4 | 34940000000 | 27699000000 | 62639000000 | 51506000000 | 6580000000 | 14111000000 | 134836000000 | 39440000000 | 0 | 0 | 0 | 127877000000 | 0 | 48849000000 | 216166000000 | 0 | 351002000000 | 54763000000 | 15613000000 | 0 | 7612000000 | 47493000000 | 125481000000 | 109106000000 | 0 | 0 | 53325000000 | 162431000000 | 0 | 0 | 287912000000 | 0 | 57365000000 | 5562000000 | 163000000 | 0 | 63090000000 | 351002000000 | 0 | 63090000000 | 351002000000 | 155576000000 | 124719000000 | 89779000000 | https://www.sec.gov/Archives/edgar/data/320193/000032019321000105/0000320193-21-000105-index.htm | https://www.sec.gov/Archives/edgar/data/320193/000032019321000105/aapl-20210925.htm |
06/26/2021 | AAPL | USD | 0000320193 | 2021-07-28 | 2021-07-27 18:03:42 | 2021 | Q3 | 34050000000 | 27646000000 | 61696000000 | 33908000000 | 5178000000 | 13641000000 | 114423000000 | 38615000000 | 0 | 0 | 0 | 131948000000 | 0 | 44854000000 | 215417000000 | 0 | 329840000000 | 40409000000 | 16039000000 | 0 | 7681000000 | 43625000000 | 107754000000 | 105752000000 | 0 | 0 | 52054000000 | 157806000000 | 0 | 0 | 265560000000 | 0 | 54989000000 | 9233000000 | 58000000 | 0 | 64280000000 | 329840000000 | 0 | 64280000000 | 329840000000 | 159594000000 | 121791000000 | 87741000000 | https://www.sec.gov/Archives/edgar/data/320193/000032019321000065/0000320193-21-000065-index.htm | https://www.sec.gov/Archives/edgar/data/320193/000032019321000065/aapl-20210626.htm |
CodePudding user response:
from gazpacho import Soup
import json
import pandas as pd
ticker = 'aapl'
url = f'https://roic.ai/financials/{ticker}?fs=annual'
soup = Soup.get(url)
scrapped_data = soup.find('script', {'id': "__NEXT_DATA__"})
data = json.loads(scrapped_data.text)
df = pd.DataFrame(data["props"]["pageProps"]["data"]["data"]["bsq"])
print(df.head())
It can be implemented like this. Don't forget to import pandas and JSON libraries.