My task is to iterate all over US zipcodes in
CodePudding user response:
I don't want to swamp the server. It looks like it queries a background database for data based on a zipcode and not all zipcodes have associated data. If you can determine a suitable range then use that in an iterable such as a list. A simply try except against all zip codes would be a very large number of requests and you would need to start thinking about batching requests, spreading over time, adding in pauses and switching to asynchronous requests.
The chart data can be extracted from a JavaScript object within the response text and parsed with json
library. I assume that the years are consistent across responses.
import requests
import pandas as pd
import re, json
results = []
columns = ['zip']
with requests.Session() as s:
s.headers = {'User-Agent':'Mozilla/5.0'}
for code in range(23022, 23025):
url = f'https://www.unitedstateszipcodes.org/{code}/#stats'
r = s.get(url)
try:
res = re.search(r'var data = (\[.*\])', r.text).group(1)
data = json.loads(res)[0]['values']
values = [i['y'] for i in data]
values.insert(0, code)
results.append(values)
if values and len(columns) == 1:
columns.extend([i['x'] for i in data])
except:
pass
df = pd.DataFrame(results, columns = columns)
print(df)