Scraping quarantined subreddits-CodePudding

I'm working on a misinformation project and I want to scrape a couple of quarantines subreddits (r/russsia specifically). When I follow the guidelines posted on the praw docs I get a prawcore.exceptions.Forbidden: received 403 HTTP response error.

I saw a couple of solutions from 3 years ago about manually adding the subreddit on the browser and using quarn.opt_in() but no luck. Below is a code snippit:

reddit = praw.Reddit(user_agent='Comment Extraction (by /u/guy_asking_on_stackoverflow)',
                     client_id=sec.reddit_client_id, client_secret=sec.reddit_client_secret)

subred = reddit.subreddit(subreddit)
subred.quaran.opt_in() # error happens here

# for post in subred.top(limit=10): ERROR HAPPENS BEFORE, KEPT FOR POST HISTORY
#     pass  # error happens here

subred is of type, praw.models.reddit.subreddit.Subreddit but it will not return submissions.

Any ideas for a solution?

full error:

Traceback (most recent call last):
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3361, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-9de81e112c74>", line 1, in <cell line: 1>
    for post in subred.top(limit=10):
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/models/listing/generator.py", line 63, in __next__
    self._next_batch()
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/models/listing/generator.py", line 73, in _next_batch
    self._listing = self._reddit.get(self.url, params=self.params)
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 595, in get
    return self._objectify_request(method="GET", params=params, path=path)
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 696, in _objectify_request
    self.request(
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/praw/reddit.py", line 885, in request
    return self._core.request(
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/prawcore/sessions.py", line 330, in request
    return self._request_with_retries(
  File "/Users/travisbarton/opt/anaconda3/envs/work3.8/lib/python3.8/site-packages/prawcore/sessions.py", line 266, in _request_with_retries
    raise self.STATUS_EXCEPTIONS[response.status_code](response)
prawcore.exceptions.Forbidden: received 403 HTTP response

CodePudding user response：

To scrape quarantined subreddits your client cannot be read only.

You can make your client fully authorized by also providing the account username and password.

reddit = praw.Reddit(user_agent='Comment Extraction (by /u/guy_asking_on_stackoverflow)',
                     client_id=sec.reddit_client_id, client_secret=sec.reddit_client_secret,
                     password=sec.reddit_password, username=sec.reddit_username)

https://praw.readthedocs.io/en/stable/getting_started/authentication.html#password-flow