Home > Software design >  Beautifulsoup only scraping 8 results when there are more
Beautifulsoup only scraping 8 results when there are more

Time:07-16

I'm teaching myself beautifulsoup and trying to scrape some reddit titles. The list, however, only contains 8 reddit titles. That's weird, since the page contains a lot more reddit titles (I tried saving it). What am I doing wrong and how can I get it to scrape the whole page?

This is my code:

from bs4 import BeautifulSoup as bs
import requests
page = requests.get("https://www.reddit.com/r/RedditWritesSeinfeld/search/?q=flair:prompt&restrict_sr=1&sr_nsfw=&t=all&sort=top")
soup = bs(page.content, 'html.parser')
soupbody = soup.select("div h3") #Selects one element lists of all reddit titles
def listreddittitles(l): #returns a list of all reddit post titles as strings 
    temp = []
    for i in l:
        temp.append(i.contents[0])
    return temp
reddittitles = listreddittitles(soupbody)
print(len(reddittitles))
input()

CodePudding user response:

The data is most probably loaded dynamically via JavaScript, so you need to simulate the Ajax with requests module.

Or

You can append .json to the URL and receive data from the server in Json format:

import json
import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0"
}

params = {
    "q": "flair:prompt",
    "restrict_sr": "1",
    "sr_nsfw": "",
    "t": "all",
    "sort": "top",
}

data = requests.get(
    "https://www.reddit.com/r/RedditWritesSeinfeld/search/.json",
    params=params,
    headers=headers,
).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for i, d in enumerate(data["data"]["children"], 1):
    print("{:<3} {}".format(i, d["data"]["title"][:50]))

Prints:

1   Jerry’s Australian girlfriend buys him a birthday 
2   George impresses his girlfriend with his generosit
3   George accidentally eats a pot brownie at a party.
4   George accidentally writes "Congrats! Way to Go!" 
5   What would George’s social media profiles look lik
6   In a very special episode, George, at his father F
7   Elaine starts dating a guy named George, so the re
8   There is a big protest in the city. George is inte
9   Jerry, horrified to find a mouse in his apartment,
10  George tries to find out what Doctor his Doctor se
11  George throws a tantrum on a date when they go to 
12  Jerry dates a marketing exec he met through Elaine
13  A true crime podcast accuses a notorious killer of
14  Kramer notices that a lot of runners are using the
15  “The George” - The name “George” becomes a viral t
16  Jerry dates a beautiful woman who has a quirk that
17  "She's a floor-sleeper!"
18  Jerry’s new girlfriend starts saying “y’all” even 
19  George developes an app that rates public restroom
20  Elaine's boyfriend's parrot says "Julia" over and 
21  The Gauntlet - George tries to wipe out half of al
22  George’s gf’s weighted blanket is too heavy for in
23  [Prompt] George saves a pregnant woman who is so g
24  Jerry's Italian girlfriend calls George "Giorgio;"
25  Frank might be deported to Italy due to an old pap

Or:

Use the url in the form of "https://old.reddit.com/r/RedditWritesSeinfeld/search/ (note the old. at the beginning) and parse it with beautifulsoup library)

  • Related