I'm looking to make a program that can get the text off a website when given the website's URL. I would like to be able to get all text between the
tags. Everywhere I have looked online seems to overcomplicate this and it involves some coding in C which I am not well versed in. To summarize what I would like the code to look like (best case scenario). If theres anything I can clarify or is unclear in the question please let me know in comments
import WebReader as WR
StringOfWebText = WR.getParagrahText("WebsiteURL")
CodePudding user response:
You probably want to look into something like BeautifulSoup paired with requests. You can then extract text from a page with a simple solution like this:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://google.com")
soup = BeautifulSoup(r.text, "html.parser")
print(s.text)
There's also tag-searching and other useful features built into BS4, if you need to be able to handle that.
CodePudding user response:
Reading selected webpage content using python. If I'm not wrong, What you are looking for is a Python Web Scraper. This link will provide you with all the information and examples.
https://www.geeksforgeeks.org/reading-selected-webpage-content-using-python-web-scraping/