Home > OS >  How to read text off a website using python (Simple explanation)
How to read text off a website using python (Simple explanation)

Time:03-24

I'm looking to make a program that can get the text off a website when given the website's URL. I would like to be able to get all text between the

tags. Everywhere I have looked online seems to overcomplicate this and it involves some coding in C which I am not well versed in. To summarize what I would like the code to look like (best case scenario). If theres anything I can clarify or is unclear in the question please let me know in comments

import WebReader as WR

StringOfWebText = WR.getParagrahText("WebsiteURL")

CodePudding user response:

You probably want to look into something like BeautifulSoup paired with requests. You can then extract text from a page with a simple solution like this:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://google.com")
soup = BeautifulSoup(r.text, "html.parser")
print(s.text)

There's also tag-searching and other useful features built into BS4, if you need to be able to handle that.

CodePudding user response:

Reading selected webpage content using python. If I'm not wrong, What you are looking for is a Python Web Scraper. This link will provide you with all the information and examples.

https://www.geeksforgeeks.org/reading-selected-webpage-content-using-python-web-scraping/

  • Related