Home > Enterprise >  Scraping quarantined subreddits
Scraping quarantined subreddits

Time:03-21

I'm working on a misinformation project and I want to scrape a couple of quarantines subreddits (r/russsia specifically). When I follow the guidelines posted on the praw docs I get a prawcore.exceptions.Forbidden: received 403 HTTP response error.

I saw a couple of solutions from 3 years ago about manually adding the subreddit on the browser and using quarn.opt_in() but no luck. Below is a code snippit:

reddit = praw.Reddit(user_agent='Comment Extraction (by /u/guy_asking_on_stackoverflow)',
                     client_id=sec.reddit_client_id, client_secret=sec.reddit_client_secret)

subred = reddit.subreddit(subreddit)
subred.quaran.opt_in() # error happens here

# for post in subred.top(limit=10): ERROR HAPPENS BEFORE, KEPT FOR POST HISTORY
#     pass  # error happens here

subred is of type, praw.models.reddit.subreddit.Subreddit but it will not return submissions.

Any ideas for a solution?

full error:

Traceback (most recent call last):
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3361, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-9de81e112c74>", line 1, in <cell line: 1>
    for post in subred.top(limit=10):
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/models/listing/generator.py", line 63, in __next__
    self._next_batch()
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/models/listing/generator.py", line 73, in _next_batch
    self._listing = self._reddit.get(self.url, params=self.params)
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 595, in get
    return self._objectify_request(method="GET", params=params, path=path)
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 696, in _objectify_request
    self.request(
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 885, in request
    return self._core.request(
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/prawcore/sessions.py", line 330, in request
    return self._request_with_retries(
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/prawcore/sessions.py", line 266, in _request_with_retries
    raise self.STATUS_EXCEPTIONS[response.status_code](response)
prawcore.exceptions.Forbidden: received 403 HTTP response

CodePudding user response:

To scrape quarantined subreddits your client cannot be read only.

You can make your client fully authorized by also providing the account username and password.

reddit = praw.Reddit(user_agent='Comment Extraction (by /u/guy_asking_on_stackoverflow)',
                     client_id=sec.reddit_client_id, client_secret=sec.reddit_client_secret,
                     password=sec.reddit_password, username=sec.reddit_username)

https://praw.readthedocs.io/en/stable/getting_started/authentication.html#password-flow

  • Related