How to read text off a website using python (Simple explanation)-CodePudding

I'm looking to make a program that can get the text off a website when given the website's URL. I would like to be able to get all text between the

tags. Everywhere I have looked online seems to overcomplicate this and it involves some coding in C which I am not well versed in. To summarize what I would like the code to look like (best case scenario). If theres anything I can clarify or is unclear in the question please let me know in comments

import WebReader as WR

StringOfWebText = WR.getParagrahText("WebsiteURL")

CodePudding user response：

You probably want to look into something like BeautifulSoup paired with requests. You can then extract text from a page with a simple solution like this:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://google.com")
soup = BeautifulSoup(r.text, "html.parser")
print(s.text)

There's also tag-searching and other useful features built into BS4, if you need to be able to handle that.

CodePudding user response：

Reading selected webpage content using python. If I'm not wrong, What you are looking for is a Python Web Scraper. This link will provide you with all the information and examples.

https://www.geeksforgeeks.org/reading-selected-webpage-content-using-python-web-scraping/