Reading RSS feed in Python-CodePudding

I am trying to obtain the first title.text from this RSS feed:

CodePudding user response：

The issue is likely caused by the xml namespace, which is specified at the top of the feed element.

In the script you have used the findall method with the argument ".//entry" to find all the entry elements. However, since the feed uses the xml namespace "http://www.w3.org/2005/Atom", this path does not match any elements in the feed.

One way to handle this is by specifying the namespace when calling the findall method. You can do this by adding the namespace as a key-value pair in a dictionary and passing it as the second argument to the findall method.

import requests
from xml.etree import ElementTree

rss_url = 'https://www.mmafighting.com/rss/current'
response = requests.get(rss_url)

if response.status_code == 200:
    rss_feed = response.text
    # parse the RSS feed using xml.etree.ElementTree
    root = ElementTree.fromstring(rss_feed)
    ns = {'atom': 'http://www.w3.org/2005/Atom'}
    entries = root.findall('.//atom:entry', ns)
    # print(f"entries: {entries}")
    if len(entries) > 0:
    #    for entry in entries:
        # title = entry.find("atom:title", ns)
        title = entries[0].find("atom:title", ns)
        if title is not None:
            print(title.text)
        else:
            print("No title found in this entry")
    else:
        print("No entry found in the RSS feed")
else:
    print("Failed to get RSS feed. Status code:", response.status_code)

CodePudding user response：

You can easily achieve this by using beautiful soup,

here the updated code:

import requests
# from xml.etree import ElementTree
from bs4 import BeautifulSoup


rss_url = 'https://www.mmafighting.com/rss/current'
response = requests.get(rss_url)

if response.status_code == 200:
    rss_feed = response.text
    soup = BeautifulSoup(rss_feed, features="xml")
    entries = soup.find_all('title')
    if len(entries) > 0:
        title = entries[0]
        if title is not None:
            print(title.text)
        else:
            print("No title found in the first entry")
    else:
        print("No entry found in the RSS feed")
else:
    print("Failed to get RSS feed. Status code:", response.status_code)

variable entries is a list that contains all the titles, you can iterate on entries to get all the titles.

Output:

How to install Beautiful Soup?

Just run this command- pip install bs4