How to extract text from hyperlink using python?-CodePudding

Right now i'm trying to validate it whether it is broken or not and to update in the excel sheet. for that i need to get that text from the hyperlink, so that link for easy understandable.

They will provide all the links in an excel sheet.

Link
https://www.dailythanthi.com/Careers 
https://www.dailythanthi.com/Paper-Ad-Tariff

I need to parse this content like this to excel sheet

Link                               text                           response code
https://www.dailyt...        Careers                                   200
https://www.dailyt...        Paper Advertisement                       404

is it possible to extract text link from the link they provided?

CodePudding user response：

If you want to get text from the Google spreadsheet directly, you can use the package google_spreadsheet, otherway you can download it, convert it in csv and manipulate it with the csv library.

In order to get the response code, you can use:

import requests
response = requests.get('website_url')
print(str(response)[11:14])

After, to get the title:

from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
for title in soup.find_all('title'):
    print(title.get_text())

CodePudding user response：

Link is a string so you can use string functions in Python to work with it.

For example you can use .split('/')

links = [
  "https://www.dailythanthi.com/Careers",
  "https://www.dailythanthi.com/Paper-Ad-Tariff",
]

for item in links:
    parts = item.split("/") 
    print( parts[3] )

Result:

Careers
Paper-Ad-Tariff

But for more complex tasks you could use standard module urllib.parse

links = [
  "https://www.dailythanthi.com/Careers",
  "https://www.dailythanthi.com/Paper-Ad-Tariff",
]

import urllib.parse

for item in links:
    parts = urllib.parse.urlparse(item)
    print( parts.path[1:] )