How to scrape video URL from Webpage using python?-CodePudding

I want to download videos from a website.

Here is my code. Every time when i run this code, it returns blank file. Here is live code: https://colab.research.google.com/drive/19NDLYHI2n9rG6KeBCiv9vKXdwb5JL9Nb?usp=sharing

from bs4 import BeautifulSoup
import requests

url = requests.get("https://www.mxtakatak.com/xt0.3a7ed6f84ded3c0f678638602b48bb1b840bea7edb3700d62cebcf7a400d4279/video/20000kCCF0")

page = url.content

soup = BeautifulSoup(page, "html.parser")

#print(soup.prettify())

result = soup.find_all('video', class_="video-player")

print(result)

CodePudding user response：

You always get a blank return because soup.find_all() doesn't find anything. Maybe you should check the url.content you receive by hand and then decide what to look for with find_all()

EDIT: After digging a bit around I found out how to get the content_url_orig:

from bs4 import BeautifulSoup
import requests
import json

url = requests.get("https://www.mxtakatak.com/xt0.3a7ed6f84ded3c0f678638602b48bb1b840bea7edb3700d62cebcf7a400d4279/video/20000kCCF0")

page = url.content

soup = BeautifulSoup(page, "html.parser")



result = str(soup.find_all('script')[1]) #looking for script tag inside the html-file
result = result.split('window._state = ')[1].split("</script>']")[0].split('\n')[0] 
#separating the json from the whole script-string, digged around in the file to find out how to do it

result = json.loads(result)


#navigating in the json to get the video-url
entity = list(result['entities'].items())[0][1]
download_url = entity['content_url_orig']

print(download_url)

CodePudding user response：

using Regex

import requests
import re

response = requests.get("....../video/20000kCCF0")
videoId = '20000kCCF0'
videos = re.findall(r'https://[^"] '   videoId   '[^"] mp4', response.text)
print(videos)