Home > Back-end >  Fail to grab the posting dates from encrypted content or so
Fail to grab the posting dates from encrypted content or so

Time:10-07

I'm trying to scrape the titles and posting dates of different jobs from this webpage. The content of that page seems to be dynamic and loaded using an endpoint. I can parse titles from json response but fail to grab the posting dates.

I've tried with:

import requests
from pprint import pprint

link = 'https://sapi.craigslist.org/web/v7/postings/search/full?batch=4-0-360-0-0&cc=US&lang=en&searchPath=acc'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link)
    for item in res.json()['data']['items']:
        print(item)

Current output:

[12030636, 2611302, 23, -1, '1:1~42.3492~-71.0768', 'Executive Assistant']
[12017824, 2609705, 23, -1, '1:2~42.3943~-71.218', 'Staff Accountant - Accounts Receivable (TEMP)']
[11638522, 2526012, 23, -1, '2:3~42.2093~-70.9963', 'Bookkeeper']
[11626278, 2524450, 23, -1, '1:1~42.3492~-71.0768', 'Top Consulting Company seeking Accounting Associate']
[11353351, 2456092, 23, -1, '1:1~42.3492~-71.0768', 'ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely']
[11348351, 2455214, 23, -1, '1:4~42.3647~-71.1042', 'Bookeeper needed part-time']

Expected output:

Oct 7 Executive Assistant
Oct 7 Staff Accountant - Accounts Receivable (TEMP)
Oct 6 Bookkeeper
Oct 6 Top Consulting Company seeking Accounting Associate
Oct 5 ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely
Oct 5 Bookeeper needed part-time

How can I achieve the desired output?

CodePudding user response:

Try:

import requests
from datetime import datetime

link = "https://sapi.craigslist.org/web/v7/postings/search/full?batch=4-0-360-0-0&cc=US&lang=en&searchPath=acc"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}

data = requests.get(link, headers=headers).json()
min_posted_date = data["data"]["decode"]["minPostedDate"]

for i in data["data"]["items"]:
    t = datetime.fromtimestamp(min_posted_date   i[1])
    print(t, i[-1])

Prints:

2022-10-06 19:58:32 Tax Assistant
2022-10-06 18:37:05 Executive Assistant
2022-10-06 18:10:28 Staff Accountant - Accounts Receivable (TEMP)
2022-10-05 18:55:35 Bookkeeper
2022-10-05 18:29:33 Top Consulting Company seeking Accounting Associate
2022-10-04 23:30:15 ID Bookkeeper-Interior Design Bookkeeper/Accountant-Work Remotely
2022-10-04 23:15:37 Bookeeper needed part-time
2022-10-04 21:08:42 According 65 hrs
2022-10-04 17:50:35 Accounts Payable Specialist
2022-10-04 14:50:34 Compliance Assistant- Symphony
2022-10-04 11:57:52 Bookkeeping Assistant

...
  • Related