I am trying to scrape data for a project from this website, specifically the table under the "Matchups" tab.
I'm brand new to web scraping so I did some digging through inspect element, but as far as I can tell the table is loaded dynamically, so none of the data can be found in the source. I looked in dev tools and I found a connection to a websocket at the url "wss://s-usc1a-nss-2024.firebaseio.com/.ws?v=5&ns=data-reaper", which I'm guessing is where the data is stored.
I read the firebase REST API and tried to make a request using the path I found in this file:
curl "https://s-usc1a-nss-2024.firebaseio.com/Data/tableData/Standard.json"
but got
{ "error" : "Permission denied" }
There is also this table which is identical (although slightly outdated) which seems to be hosted on Tableau. I tried using a Tableau Scraping Library made by Bertrand Martel:
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/app/profile/tzachi.zach/viz/DataReaper243-MatchupWinRates/WinratesLeague"
ts = TS()
which yields this error:
Traceback (most recent call last):
File "C:\Users\d4wgr\AppData\Local\Programs\Python\Python310\scrapertest.py", line 6, in <module>
File "C:\Users\d4wgr\AppData\Local\Programs\Python\Python310\lib\site-packages\tableauscraper\TableauScraper.py", line 80, in loads
soup.find("textarea", {"id": "tsConfigContainer"}).text
AttributeError: 'NoneType' object has no attribute 'text'
which, according to the creator of the library in this thread, is caused by authentication errors.
I'm wondering if I need some sort of key to make either of these requests, or if perhaps I have the wrong path in the first block. It is also entirely possible that I actually don't have the necessary permissions to make this type of request, in which case I suppose the only way to extract the data would be via selenium or something similar.
CodePudding user response:
It seems like the developer of the app you are scraping has disabled unauthenticated access to their Firestore database. Bypassing Firebase authentication is nearly impossible. So my suggestion for you is to invest your resources in either building a JavaScript capable web scraper, or to use an external web scraping service.
As an engineer at Web Scraping API I came up with this script for you, which which uses our service to return the targeted HTML table of the file, after it clicks:
- the 'Feedback' button (Greeting traveler popup)
- the 'Matchup' buttons
- The 'Show table' button
import requests, json
SCRAPER_URL = 'https://api.webscrapingapi.com/v1'
TARGET_URL = 'https://www.vicioussyndicate.com/data-reaper-live-beta/'
"url": TARGET_URL,
"js_instructions": '[{"action":"click","selector":"button#basicBtn","timeout": 5000, "block": "start"},{"action":"click","selector":"button#table","timeout": 5000, "block": "start"},{"action":"click","selector":"button#number","timeout": 5000, "block": "start"}]',
response = requests.get(SCRAPER_URL, params=PARAMS)
json = json.loads(response.text)
table = json['table'][0]
The table
(result) looks like:
<table id="numberTable">
<th >Rank -></th>
<td style="background-color: rgb(0, 128, 0); color: rgb(255, 255, 255);">DemonHunter</td>
<td style="background-color: rgb(175, 193, 222);">7.1%</td>
<td style="background-color: rgb(178, 195, 223);">6.8%</td>
<td style="background-color: rgb(173, 192, 221);">7.3%</td>
<td style="background-color: rgb(174, 192, 222);">7.2%</td>
<td style="background-color: rgb(162, 183, 217);">8.2%</td>
<td style="background-color: rgb(157, 179, 215);">8.7%</td>
<td style="background-color: rgb(158, 180, 215);">8.6%</td>
<td style="background-color: rgb(157, 179, 214);">8.7%</td>
<td style="background-color: rgb(166, 186, 218);">7.9%</td>
<td style="background-color: rgb(155, 178, 214);">8.9%</td>
<td style="background-color: rgb(152, 175, 212);">9.1%</td>
<td style="background-color: rgb(158, 180, 215);">8.6%</td>
<td style="background-color: rgb(155, 178, 214);">8.9%</td>
<td style="background-color: rgb(143, 168, 209);">10.0%</td>
<td style="background-color: rgb(136, 163, 206);">10.6%</td>
<td style="background-color: rgb(139, 165, 207);">10.3%</td>
<td style="background-color: rgb(136, 163, 206);">10.6%</td>
<td style="background-color: rgb(148, 172, 211);">9.6%</td>
<td style="background-color: rgb(160, 182, 216);">8.4%</td>
<td style="background-color: rgb(126, 155, 202);">11.4%</td>
<td style="background-color: rgb(121, 85, 72); color: rgb(255, 255, 255);">Druid</td>
<td style="background-color: rgb(106, 140, 194);">13.2%</td>
<td style="background-color: rgb(114, 146, 197);">12.6%</td>
<td style="background-color: rgb(99, 134, 191);">13.9%</td>
<td style="background-color: rgb(88, 126, 186);">14.8%</td>
<td style="background-color: rgb(91, 128, 188);">14.6%</td>
<td style="background-color: rgb(90, 127, 187);">14.7%</td>
<td style="background-color: rgb(88, 126, 186);">14.8%</td>
<td style="background-color: rgb(87, 125, 186);">15.0%</td>
<td style="background-color: rgb(87, 125, 186);">15.2%</td>
<td style="background-color: rgb(87, 125, 186);">15.1%</td>
<td style="background-color: rgb(96, 131, 189);">14.2%</td>
<td style="background-color: rgb(98, 133, 190);">14.0%</td>
<td style="background-color: rgb(100, 135, 191);">13.8%</td>
<td style="background-color: rgb(107, 140, 194);">13.2%</td>
<td style="background-color: rgb(87, 125, 186);">15.1%</td>
<td style="background-color: rgb(87, 125, 186);">15.1%</td>
<td style="background-color: rgb(92, 129, 188);">14.5%</td>
<td style="background-color: rgb(108, 141, 194);">13.1%</td>
<td style="background-color: rgb(128, 157, 203);">11.3%</td>
<td style="background-color: rgb(108, 141, 194);">13.1%</td>
<td style="background-color: rgb(104, 159, 56); color: rgb(34, 34, 34);">Hunter</td>
<td style="background-color: rgb(120, 151, 199);">12.0%</td>
<td style="background-color: rgb(134, 162, 205);">10.7%</td>
<td style="background-color: rgb(141, 167, 208);">10.1%</td>
<td style="background-color: rgb(135, 162, 205);">10.7%</td>
<td style="background-color: rgb(146, 170, 210);">9.7%</td>
<td style="background-color: rgb(143, 168, 209);">10.0%</td>
<td style="background-color: rgb(154, 177, 213);">8.9%</td>
<td style="background-color: rgb(156, 178, 214);">8.8%</td>
<td style="background-color: rgb(149, 173, 211);">9.4%</td>
<td style="background-color: rgb(158, 180, 215);">8.6%</td>
<td style="background-color: rgb(153, 176, 213);">9.0%</td>
<td style="background-color: rgb(147, 171, 210);">9.6%</td>
<td style="background-color: rgb(147, 172, 210);">9.6%</td>
<td style="background-color: rgb(160, 181, 216);">8.4%</td>
<td style="background-color: rgb(168, 188, 219);">7.7%</td>
<td style="background-color: rgb(166, 186, 218);">7.9%</td>
<td style="background-color: rgb(160, 181, 216);">8.5%</td>
<td style="background-color: rgb(155, 178, 214);">8.9%</td>
<td style="background-color: rgb(154, 176, 213);">9.0%</td>
<td style="background-color: rgb(189, 204, 228);">5.8%</td>
<td style="background-color: rgb(79, 195, 247); color: rgb(34, 34, 34);">Mage</td>
<td style="background-color: rgb(87, 125, 186);">16.3%</td>
<td style="background-color: rgb(87, 125, 186);">18.4%</td>
<td style="background-color: rgb(87, 125, 186);">17.5%</td>
<td style="background-color: rgb(87, 125, 186);">16.4%</td>
<td style="background-color: rgb(87, 125, 186);">15.3%</td>
<td style="background-color: rgb(87, 125, 186);">15.0%</td>
<td style="background-color: rgb(87, 125, 186);">15.0%</td>
<td style="background-color: rgb(87, 125, 186);">14.9%</td>
<td style="background-color: rgb(99, 134, 190);">13.9%</td>
<td style="background-color: rgb(90, 127, 187);">14.7%</td>
<td style="background-color: rgb(95, 131, 189);">14.3%</td>
<td style="background-color: rgb(88, 125, 186);">14.9%</td>
<td style="background-color: rgb(87, 125, 186);">15.1%</td>
<td style="background-color: rgb(87, 125, 186);">15.2%</td>
<td style="background-color: rgb(110, 143, 195);">12.9%</td>
<td style="background-color: rgb(102, 137, 192);">13.6%</td>
<td style="background-color: rgb(110, 143, 195);">12.9%</td>
<td style="background-color: rgb(115, 146, 197);">12.5%</td>
<td style="background-color: rgb(119, 149, 199);">12.1%</td>
<td style="background-color: rgb(115, 147, 197);">12.4%</td>
<td style="background-color: rgb(255, 238, 88); color: rgb(34, 34, 34);">Paladin</td>
<td style="background-color: rgb(159, 181, 215);">8.5%</td>
<td style="background-color: rgb(156, 178, 214);">8.8%</td>
<td style="background-color: rgb(156, 178, 214);">8.8%</td>
<td style="background-color: rgb(156, 178, 214);">8.8%</td>
<td style="background-color: rgb(152, 175, 212);">9.2%</td>
<td style="background-color: rgb(153, 176, 213);">9.0%</td>
<td style="background-color: rgb(153, 176, 213);">9.1%</td>
<td style="background-color: rgb(153, 176, 213);">9.1%</td>
<td style="background-color: rgb(152, 175, 212);">9.2%</td>
<td style="background-color: rgb(155, 178, 214);">8.8%</td>
<td style="background-color: rgb(148, 172, 211);">9.5%</td>
<td style="background-color: rgb(145, 170, 210);">9.8%</td>
<td style="background-color: rgb(143, 168, 209);">10.0%</td>
<td style="background-color: rgb(142, 167, 208);">10.1%</td>
<td style="background-color: rgb(153, 176, 213);">9.0%</td>
<td style="background-color: rgb(155, 177, 213);">8.9%</td>
<td style="background-color: rgb(166, 186, 218);">7.9%</td>
<td style="background-color: rgb(180, 197, 224);">6.6%</td>
<td style="background-color: rgb(177, 194, 223);">6.9%</td>
<td style="background-color: rgb(160, 181, 216);">8.4%</td>
<td style="background-color: rgb(189, 189, 187); color: rgb(34, 34, 34);">Priest</td>
<td style="background-color: rgb(157, 179, 215);">8.7%</td>
<td style="background-color: rgb(147, 171, 210);">9.6%</td>
<td style="background-color: rgb(156, 178, 214);">8.8%</td>
<td style="background-color: rgb(167, 187, 218);">7.8%</td>
<td style="background-color: rgb(173, 192, 221);">7.3%</td>
<td style="background-color: rgb(172, 190, 220);">7.4%</td>
<td style="background-color: rgb(161, 182, 216);">8.4%</td>
<td style="background-color: rgb(173, 192, 221);">7.2%</td>
<td style="background-color: rgb(178, 196, 223);">6.8%</td>
<td style="background-color: rgb(160, 181, 216);">8.4%</td>
<td style="background-color: rgb(165, 186, 218);">8.0%</td>
<td style="background-color: rgb(170, 189, 220);">7.5%</td>
<td style="background-color: rgb(176, 194, 222);">7.0%</td>
<td style="background-color: rgb(168, 188, 219);">7.7%</td>
<td style="background-color: rgb(158, 180, 215);">8.6%</td>
<td style="background-color: rgb(171, 190, 220);">7.4%</td>
<td style="background-color: rgb(180, 197, 224);">6.7%</td>
<td style="background-color: rgb(176, 194, 222);">7.0%</td>
<td style="background-color: rgb(179, 196, 223);">6.8%</td>
<td style="background-color: rgb(147, 172, 211);">9.6%</td>
<td style="background-color: rgb(66, 66, 66); color: rgb(255, 255, 255);">Rogue</td>
<td style="background-color: rgb(173, 191, 221);">7.3%</td>
<td style="background-color: rgb(160, 182, 216);">8.4%</td>
<td style="background-color: rgb(156, 178, 214);">8.8%</td>
<td style="background-color: rgb(162, 183, 217);">8.2%</td>
<td style="background-color: rgb(159, 180, 215);">8.6%</td>
<td style="background-color: rgb(161, 183, 216);">8.3%</td>
<td style="background-color: rgb(165, 185, 218);">8.0%</td>
<td style="background-color: rgb(167, 187, 219);">7.8%</td>
<td style="background-color: rgb(169, 188, 219);">7.7%</td>
<td style="background-color: rgb(149, 173, 211);">9.5%</td>
<td style="background-color: rgb(155, 178, 214);">8.8%</td>
<td style="background-color: rgb(158, 180, 215);">8.6%</td>
<td style="background-color: rgb(162, 183, 216);">8.3%</td>
<td style="background-color: rgb(169, 188, 219);">7.6%</td>
<td style="background-color: rgb(142, 167, 208);">10.1%</td>
<td style="background-color: rgb(149, 173, 211);">9.5%</td>
<td style="background-color: rgb(153, 176, 213);">9.1%</td>
<td style="background-color: rgb(151, 174, 212);">9.2%</td>
<td style="background-color: rgb(151, 175, 212);">9.2%</td>
<td style="background-color: rgb(87, 125, 186);">15.0%</td>
<td style="background-color: rgb(92, 107, 192); color: rgb(255, 255, 255);">Shaman</td>
<td style="background-color: rgb(172, 191, 221);">7.3%</td>
<td style="background-color: rgb(184, 200, 225);">6.3%</td>
<td style="background-color: rgb(187, 203, 227);">6.0%</td>
<td style="background-color: rgb(192, 206, 229);">5.6%</td>
<td style="background-color: rgb(189, 204, 228);">5.8%</td>
<td style="background-color: rgb(192, 206, 229);">5.6%</td>
<td style="background-color: rgb(196, 209, 231);">5.2%</td>
<td style="background-color: rgb(198, 211, 231);">5.1%</td>
<td style="background-color: rgb(202, 214, 233);">4.7%</td>
<td style="background-color: rgb(200, 212, 232);">4.9%</td>
<td style="background-color: rgb(201, 213, 232);">4.8%</td>
<td style="background-color: rgb(206, 217, 234);">4.4%</td>
<td style="background-color: rgb(204, 216, 234);">4.5%</td>
<td style="background-color: rgb(202, 214, 233);">4.7%</td>
<td style="background-color: rgb(206, 217, 234);">4.4%</td>
<td style="background-color: rgb(208, 218, 235);">4.2%</td>
<td style="background-color: rgb(213, 222, 237);">3.7%</td>
<td style="background-color: rgb(216, 224, 239);">3.5%</td>
<td style="background-color: rgb(214, 223, 238);">3.6%</td>
<td style="background-color: rgb(220, 228, 241);">3.0%</td>
<td style="background-color: rgb(156, 39, 176); color: rgb(255, 255, 255);">Warlock</td>
<td style="background-color: rgb(87, 125, 186);">15.1%</td>
<td style="background-color: rgb(95, 131, 189);">14.2%</td>
<td style="background-color: rgb(87, 125, 186);">15.5%</td>
<td style="background-color: rgb(87, 125, 186);">17.2%</td>
<td style="background-color: rgb(87, 125, 186);">18.7%</td>
<td style="background-color: rgb(87, 125, 186);">19.0%</td>
<td style="background-color: rgb(87, 125, 186);">19.5%</td>
<td style="background-color: rgb(87, 125, 186);">21.0%</td>
<td style="background-color: rgb(87, 125, 186);">23.2%</td>
<td style="background-color: rgb(87, 125, 186);">18.5%</td>
<td style="background-color: rgb(87, 125, 186);">19.6%</td>
<td style="background-color: rgb(87, 125, 186);">20.0%</td>
<td style="background-color: rgb(87, 125, 186);">20.6%</td>
<td style="background-color: rgb(87, 125, 186);">20.8%</td>
<td style="background-color: rgb(87, 125, 186);">19.2%</td>
<td style="background-color: rgb(87, 125, 186);">21.2%</td>
<td style="background-color: rgb(87, 125, 186);">24.3%</td>
<td style="background-color: rgb(87, 125, 186);">28.0%</td>
<td style="background-color: rgb(87, 125, 186);">30.7%</td>
<td style="background-color: rgb(87, 125, 186);">19.2%</td>
<td style="background-color: rgb(244, 67, 54); color: rgb(255, 255, 255);">Warrior</td>
<td style="background-color: rgb(205, 216, 234);">4.4%</td>
<td style="background-color: rgb(209, 219, 236);">4.1%</td>
<td style="background-color: rgb(217, 226, 239);">3.3%</td>
<td style="background-color: rgb(219, 227, 240);">3.2%</td>
<td style="background-color: rgb(224, 231, 242);">2.7%</td>
<td style="background-color: rgb(228, 234, 244);">2.3%</td>
<td style="background-color: rgb(227, 233, 243);">2.5%</td>
<td style="background-color: rgb(229, 235, 244);">2.3%</td>
<td style="background-color: rgb(231, 237, 245);">2.1%</td>
<td style="background-color: rgb(226, 232, 243);">2.6%</td>
<td style="background-color: rgb(225, 232, 242);">2.6%</td>
<td style="background-color: rgb(224, 231, 242);">2.7%</td>
<td style="background-color: rgb(228, 234, 244);">2.4%</td>
<td style="background-color: rgb(229, 235, 244);">2.2%</td>
<td style="background-color: rgb(227, 233, 243);">2.4%</td>
<td style="background-color: rgb(232, 237, 245);">2.0%</td>
<td style="background-color: rgb(234, 239, 246);">1.8%</td>
<td style="background-color: rgb(236, 240, 247);">1.7%</td>
<td style="background-color: rgb(232, 237, 245);">2.0%</td>
<td style="background-color: rgb(233, 238, 246);">1.9%</td>
You can use BeautifulSoup to parse it and manipulate the data. For example, if you add this rest of the code instead of print(table)
from bs4 import BeautifulSoup
result = []
soup = BeautifulSoup(table, 'html.parser')
lines = soup.select('tr')
for line in lines:
obj = {}
obj[line.find_all('td')[0].text] = {
You will end up with a JSON object that looks like this: