Home > Enterprise >  Beautfiul Soup return empty list when scraping YouTube chanel
Beautfiul Soup return empty list when scraping YouTube chanel

Time:03-24

I am trying to use this code to get some public information about youtube channel (API not suits well this task).

Example of code:

import re
import json
import requests
from bs4 import BeautifulSoup

URL = "https://www.youtube.com/c/Rozziofficial/about"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

# We locate the JSON data using a regular-expression pattern
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)

# Uncomment to view all the data
# print(json.dumps(data))

# This converts the JSON data to a python dictionary (dict)
json_data = json.loads(data)

# This is the info from the webpage on the right-side under "stats", it contains the data you want
stats = json_data["contents"]["twoColumnBrowseResultsRenderer"]["tabs"][5]["tabRenderer"]["content"]["sectionListRenderer"]["contents"][0]["itemSectionRenderer"]["contents"][0]["channelAboutFullMetadataRenderer"]

print("Channel Views:", stats["viewCountText"]["simpleText"])
print("Joined:", stats["joinedDateText"]["runs"][1]["text"])

Expected result (6 month ago it works well):

Joined: Jun 30, 2007

. . But now got:

AttributeError: 'NoneType' object has no attribute 'group'

traceback shows the error is on this row:

data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)

Can you help to fix this that this code continues work and return data?

Any help is appreciated, Thanks

CodePudding user response:

Your code is working fine

import re
import json
import requests
from bs4 import BeautifulSoup

URL = "https://www.youtube.com/c/Rozziofficial/about"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

# We locate the JSON data using a regular-expression pattern
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)

# Uncomment to view all the data
# print(json.dumps(data))

# This converts the JSON data to a python dictionary (dict)
json_data = json.loads(data)

# This is the info from the webpage on the right-side under "stats", it contains the data you want
stats = json_data["contents"]["twoColumnBrowseResultsRenderer"]["tabs"][5]["tabRenderer"]["content"]["sectionListRenderer"]["contents"][0]["itemSectionRenderer"]["contents"][0]["channelAboutFullMetadataRenderer"]

print("Channel Views:", stats["viewCountText"]["simpleText"])
print("Joined:", stats["joinedDateText"]["runs"][1]["text"])

Output:

Channel Views: 1,12,94,125টি ভিউ
Joined: 30 জুন, 2007

CodePudding user response:

You're not actually using BeautifulSoup at all here. You're just fetching the raw text and searching it for a string.

This is the problem with web scraping. YouTube has changed their JavaScript, and that variable no longer exists. We don't know what you are trying to find, but your current method isn't going to work. You may actually need to use Selenium to run the Javascript and pull the info from the DOM.

  • Related