Home > Net >  Use Regex to get statement from URL
Use Regex to get statement from URL

Time:06-28

So I have this IMDB link where i want to extract the genre only so I'm already using this code

```
# genre
        genre = movie.find('span',class_="genre")
        if genre != None:
           genre = str(genre).split(', <p >')[0].replace("\n", "").replace("</p>]", "")
    
       
    else:
        genre = "Not Found"
    IMDB_dict[title].append(genre)
To give me output as 

Drama,Fantasy,Horror

as seen in picture: 

![here][1]

But I want to only output Drama, Fantasy, Horror and not the that stuff above.
May I please know how to do this as I have put some Regex code there to find it but it still returns some kind of URL as seen above.
Appreciate it


  [1]: https://i.stack.imgur.com/OEmjy.png

CodePudding user response:

if you're using BeautifulSoup you can use this

genre_text = BeautifulSoup(genre).text

CodePudding user response:

This regex will extract any content between any string containing a single open and closed HTML tag with the capture group (.*).

^<.*>(.*)<\/.*>$

Python:

import re

genre = '<span >Drama,Fantasy,Horror </span>'

regx = r'^<.*>(.*)<\/.*>$'
text = re.findall(regx, genre)
print(text)

Output:

['Drama,Fantasy,Horror ']
  • Related