Home > Software design >  List of all US ZIP Codes using uszipcode
List of all US ZIP Codes using uszipcode

Time:12-31

I've been trying to fetch all US Zipcodes for a web scraping project for my company. I'm trying to use uszipcode library for doing it automatically rather than manually from the website im intersted in but cant figure it out.

this is my manual attempt:

from bs4 import BeautifulSoup
import requests

url = 'https://www.unitedstateszipcodes.org'
headers = {'User-Agent': 'Chrome/50.0.2661.102'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')

hrefs = []
all_zipcodes = []

# Extract all
for data in soup.find_all('div', class_='state-list'):
    for a in data.find_all('a'):
        if a is not None:
            hrefs.append(a.get('href'))
hrefs.remove(None)



def get_zipcode_list():
    """
           get_zipcode_list gets the GET response from the web archives server using CDX API
           :return: CDX API output in json format.
        """
    for state in hrefs:
        state_url = url   state
        state_page = requests.get(state_url, headers=headers)
        states_soup = BeautifulSoup(state_page.text, 'html.parser')
        div = states_soup.find(class_='list-group')
        for a in div.findAll('a'):
            if str(a.string).isdigit():
                all_zipcodes.append(a.string)
    return all_zipcodes

This takes alot of time and would like to know how to do the same in more efficient way using uszipcodes

CodePudding user response:

You may try to search by pattern ''

s = SearchEngine()
l = s.by_pattern('', returns=1000000)
print(len(l))

More details in docs and in their basic tutorial

CodePudding user response:

The regex that zip code in US have is [0-9]{5}(?:-[0-9]{4})?

you can simply check with re module

import re
regex = r"[0-9]{5}(?:-[0-9]{4})?"
if re.match(zipcode, regex):
    print("match")
else:
    print("not a match")

CodePudding user response:

You can download the list of zip codes as a csv from the official source(42k rows) and then parse it if its for one-time use and you don't need any other metadata associated with each of the zip codes like the one which uszipcodes provides.

The uszipcodes also has another database which is quite big and should have all the data you need.

from uszipcode import SearchEngine
zipSearch = SearchEngine(simple_zipcode=False)
allZipCodes = zipSearch.by_pattern('', returns=200000)
print(len(allZipCodes)
  • Related