For some work I do, I need to gather data regarding job titles and how frequent they are in search results so I decided to enlist Python to help me with this. Only problem is that I can't seem to figure out why this code fragment I found isn't giving me the right info I need. Here's what I have so far:
import requests
from bs4 import BeautifulSoup
from collections import Counter
from string import punctuation
# We get the url
r = requests.get("https://www.usajobs.gov/Search/Results?j=0602&d=VA&p=1")
soup = BeautifulSoup(r.content, "html.parser")
# We get the words within divs
text_div = (''.join(s.findAll(text=True))for s in soup.findAll('div'))
c_div = Counter((x.rstrip(punctuation).lower() for y in text_div for x in y.split()))
total = c_div
print(total)
I know that part of this involves inspecting the code but I can't figure out what I need to input to get the scraper to narrow down to these titles:
<a id="usajobs-search-result-0" href="/GetJob/ViewDetails/568337700" itemprop="title" data-document-id="568337700">
Would appreciate any help
CodePudding user response:
The data is loaded dynamically via sending a POST
request to:
https://www.usajobs.gov/Search/ExecuteSearch
See this example to get the correct job titles.
(You can change the page
key to specify a page number).
import requests
data = {
"JobTitle": [],
"GradeBucket": [],
"JobCategoryCode": ["0602"],
"JobCategoryFamily": [],
"LocationName": [],
"PostingChannel": [],
"Department": ["VA"],
"Agency": [],
"PositionOfferingTypeCode": [],
"TravelPercentage": [],
"PositionScheduleTypeCode": [],
"SecurityClearanceRequired": [],
"PositionSensitivity": [],
"ShowAllFilters": [],
"HiringPath": [],
"SocTitle": [],
"MCOTags": [],
"CyberWorkRole": [],
"CyberWorkGrouping": [],
"Page": "1", # <-- Change page number here
"UniqueSearchID": "9d417c5e-adc2-469c-af1d-e786cc41bc97",
"IsAuthenticated": "false",
}
response = requests.post(
"https://www.usajobs.gov/Search/ExecuteSearch", json=data
).json()
job_titles = [job["Title"] for job in response["Jobs"]]
print(job_titles)
Output:
['Psychiatrist - OCA', 'Physician - Electromyography (Temporary)', 'Physician Owensboro CBOC PC', 'Physician-Primary Care', 'OPHTHALMOLOGIST', 'UROLOGIST', 'PHYSICIAN (OTOLARYNGOLOGIST', 'Physician-Hospitalist', 'Physician - Hemotology/Oncology', 'Academic Gastroenterologist', 'Physician - Gastroenterologist', 'Physician - Orthopedic Surgeon', 'Physician (Internal Medicine or Family Practice)', 'Physician (Regular Ft)- Hematologist/Oncologist', 'Physician- Hematologist/Oncologist', 'Physician - Diagnostic Radiologist', 'Physician (Psychiatrist)', 'Physician (Endocrinologist)', 'Physician (Cardiologist)', 'Physician (Neurologist)', 'Physician (Chief Hospitalist)', 'Physician (Hospitalist)', 'Physician (Medical Director of Extended Care/Chief of Geriatrics)', 'Physician (Primary Care)', 'Physician (Hematologist/Oncologist)']