Home > Back-end >  Is there a way to scraping or crawl the m3u8 playlist?
Is there a way to scraping or crawl the m3u8 playlist?

Time:12-12

I'm a student trying to build a movie site project for practice purposes.

But I just know that the database of the movie will be enormous, so I just think "What if I just borrow the movie from another site", so I look up and find a term called "Scrape or Crawl Data", and of course, it's for non-commercial purposes, I just want to make my project work. So please if it's illegal or not the right thing to do, just comment below and I'll take that into account.

But if it's alright to do so, is there any way I can get the m3u8 playlist? I don't know how to do it (completely, to be honest). So I really need a guide or some instruction, direction where I need to do.

I'm planning to use some cheerio, Axios due to my research and try to figure it out somehow.

To be more specific, whenever I play the video, if I inspect it, at the Network tab, it'll contain a file call playlist.m3u8, and inside of it contains many small .ts files. I want to somehow get it and transfer it into watchable at my project.

CodePudding user response:

A playlist.m3u8 file is just that, a playlist, containing data about the video and links to video files/chunks, or links to chunk lists containing that.

Depending on your project it may already have a built in way to view m3u8 files, on a browser based project you just have to set the url of a video element to the url of the .m3u8.

If there is not a way to use m3u8 files already you will either have to download a parser or write one yourself.

Here is a simple parser in JS that would read every ts in order and append them to the end of the video, since it will ignore timing data and all other extra info it will be pretty glitchy but is just meant as a simple example.

//This is a simple example that is non-reliable and should not be relied upon.

const fs = require('fs');
const http = require('http');

const source_domain = "http://sourcehost.com/";

const playlist_path = "playlist.m3u8";

var video_file = fs.createWriteStream("test.mp4");

GetPlaylist(source_domain   playlist_path); 

function GetPlaylist(url){
    return new Promise((resolve, reject) => {
        http.request(url, (res) => {
            var data = '';
        
            res.on('data', (chunk) =>{
                data  = chunk;
            });
            
            res.on('end', async () => {
                var lines = data.split("\n");
                for(var line of lines){
                    if(line.indexOf(".ts") !== -1){
                        await GetChunk(source_domain   line);
                    }
                }
                video_file.end();
            });

            res.on('timeout', () => {
                reject('timeout');
            });
        }).end();
    });
}


function GetChunk(chunk_path){ 
    return new Promise((resolve, reject) => {
        http.request(chunk_path, (res) => {
            res.setEncoding('binary');
        
            res.on('data', (chunk) =>{
                video_file.write(chunk, 'binary');
            });
            
            res.on('end', () => {
                resolve();
            });

            res.on('timeout', () => {
                reject('timeout');
            });
        }).end();
    });
}
  • Related