Home > Net >  Python - Web directory file names return blank (Requests & BeautifulSoup)
Python - Web directory file names return blank (Requests & BeautifulSoup)

Time:04-04

Using the Requests and BeautifulSoup plugin, I am trying to retrieve all the mp3 file names from a specific web directory and it is returning nothing.

This is the function I am running which contains the problem:

archive_url = "http://example.com/audio/"

def get_video_links():

    r = requests.get(archive_url) 
    soup = BeautifulSoup(r.content,'html5lib')  
    links = soup.findAll('a') 
    video_links = [archive_url   link['href'] for link in links if link['href'].endswith('mp3')] 
    
    return video_links 

It is not functioning as it should, so to try to find the problem I ran print(video_links) and the command line simply outputs []. It is returning nothing. My only idea is that I could've installed the plugins incorrectly, shouldn't be using Visual Studio 2019, or need to configure the project in a different way. Can anyone spot my stupidity? I seek input humbly.

(I am attempting to follow this tutorial)

CodePudding user response:

First, I would suggest to upload the required modules to run the provided function:

import requests 
import bs4
from bs4 import BeautifulSoup 

Then, redefine the function get_video_links() that returns a list when the output video_links is a list:

def get_video_links(archive_url):

    r = requests.get(archive_url) 
    soup = BeautifulSoup(r.content,'html5lib')  
    links = soup.findAll('a') 
    
# ensure to pass a list
    video_links = [archive_url   link['href'] for link in links if link['href'].endswith('mp3')] 
     
    if isinstance(video_links, list):
         return video_links
    else:
        raise AttributeError:
         return("video_links is not a list, convert to list")
    

    return video_links 

Then, define the input for the function and call it directly from the function, instead of using print(video_links) which is an empty list as it is not defined in the code:

archive_url = "http://example.com/audio/"
video_links=get_video_links(archive_url=archive_url)

In a nutshell:

# importing required modules

import requests 
import bs4
from bs4 import BeautifulSoup 

# define function to get the video links:
# input: a list built-in
def get_video_links(archive_url):

    r = requests.get(archive_url) 
    soup = BeautifulSoup(r.content,'html5lib')  
    links = soup.findAll('a') 
    
# ensure to pass a list
    video_links = [archive_url   link['href'] for link in links if link['href'].endswith('mp3')] 
     
    if isinstance(video_links, list):
         return video_links
    else:
        raise AttributeError:
            print("video_links is not a list, convert to list")
    

    return video_links 

# set input and call it in the function

archive_url = "http://example.com/audio/"
video_links=get_video_links(archive_url=archive_url)
  • Related