Extracting population by using web scraping in Python from dynamic graph-CodePudding

My task is to iterate all over US zipcodes in

CodePudding user response：

I don't want to swamp the server. It looks like it queries a background database for data based on a zipcode and not all zipcodes have associated data. If you can determine a suitable range then use that in an iterable such as a list. A simply try except against all zip codes would be a very large number of requests and you would need to start thinking about batching requests, spreading over time, adding in pauses and switching to asynchronous requests.

The chart data can be extracted from a JavaScript object within the response text and parsed with json library. I assume that the years are consistent across responses.

import requests
import pandas as pd
import re, json

results = []
columns = ['zip']

with requests.Session() as s:
    
    s.headers = {'User-Agent':'Mozilla/5.0'}
    
    for code in range(23022, 23025): 
        
        url = f'https://www.unitedstateszipcodes.org/{code}/#stats'
        r = s.get(url)
        
        try:
            res = re.search(r'var data = (\[.*\])', r.text).group(1)
            data = json.loads(res)[0]['values']
            values = [i['y'] for i in data]
            values.insert(0, code)
            results.append(values)
            
            if values and len(columns) == 1:
                columns.extend([i['x'] for i in data])
        except:
            pass

df = pd.DataFrame(results, columns = columns)
print(df)