I'm trying to scrape the titles and posting dates of different jobs from this webpage. The content of that page seems to be dynamic and loaded using an endpoint. I can parse titles from json response but fail to grab the posting dates.
I've tried with:
import requests
from pprint import pprint
link = 'https://sapi.craigslist.org/web/v7/postings/search/full?batch=4-0-360-0-0&cc=US&lang=en&searchPath=acc'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(link)
for item in res.json()['data']['items']:
print(item)
Current output:
[12030636, 2611302, 23, -1, '1:1~42.3492~-71.0768', 'Executive Assistant']
[12017824, 2609705, 23, -1, '1:2~42.3943~-71.218', 'Staff Accountant - Accounts Receivable (TEMP)']
[11638522, 2526012, 23, -1, '2:3~42.2093~-70.9963', 'Bookkeeper']
[11626278, 2524450, 23, -1, '1:1~42.3492~-71.0768', 'Top Consulting Company seeking Accounting Associate']
[11353351, 2456092, 23, -1, '1:1~42.3492~-71.0768', 'ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely']
[11348351, 2455214, 23, -1, '1:4~42.3647~-71.1042', 'Bookeeper needed part-time']
Expected output:
Oct 7 Executive Assistant
Oct 7 Staff Accountant - Accounts Receivable (TEMP)
Oct 6 Bookkeeper
Oct 6 Top Consulting Company seeking Accounting Associate
Oct 5 ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely
Oct 5 Bookeeper needed part-time
How can I achieve the desired output?
CodePudding user response:
Try:
import requests
from datetime import datetime
link = "https://sapi.craigslist.org/web/v7/postings/search/full?batch=4-0-360-0-0&cc=US&lang=en&searchPath=acc"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}
data = requests.get(link, headers=headers).json()
min_posted_date = data["data"]["decode"]["minPostedDate"]
for i in data["data"]["items"]:
t = datetime.fromtimestamp(min_posted_date i[1])
print(t, i[-1])
Prints:
2022-10-06 19:58:32 Tax Assistant
2022-10-06 18:37:05 Executive Assistant
2022-10-06 18:10:28 Staff Accountant - Accounts Receivable (TEMP)
2022-10-05 18:55:35 Bookkeeper
2022-10-05 18:29:33 Top Consulting Company seeking Accounting Associate
2022-10-04 23:30:15 ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely
2022-10-04 23:15:37 Bookeeper needed part-time
2022-10-04 21:08:42 According 65 hrs
2022-10-04 17:50:35 Accounts Payable Specialist
2022-10-04 14:50:34 Compliance Assistant- Symphony
2022-10-04 11:57:52 Bookkeeping Assistant
...