Home > Software design >  Unable to scrape a table
Unable to scrape a table

Time:12-31

I'm attempting to scrape the data from a table on the following website: https://droughtmonitor.unl.edu/DmData/DataTables.aspx

import requests
from bs4 import BeautifulSoup

url = 'https://droughtmonitor.unl.edu/DmData/DataTables.aspx'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
drought_table = soup.find('table', {'id':'datatabl'}).find('tbody').find_all('tr')

for some reason I am getting no outputs. I've tried to use pandas for the same job

import pandas as pd
url = 'https://droughtmonitor.unl.edu/DmData/DataTables.aspx'
table = pd.read_html(url)
df = table[0]

But also ended up getting an empty dataframe. What could be causing this?

CodePudding user response:

By checking network tool of browser it's obvious site uses Fetch/XHR to load table in another request.

Image: network monitor

You can use this code to get table data:

import requests
import json

headers = {
    'Content-Type': 'application/json; charset=utf-8',
}

params = (
    ('area', '\'conus\''),
    ('statstype', '\'1\''),
)

response = requests.get(
    'https://droughtmonitor.unl.edu/DmData/DataTables.aspx/ReturnTabularDMAreaPercent_national',
    headers=headers, params=params
)

table = json.loads(response.content)

# Code generated by https://curlconverter.com/

  • Related