Home > Mobile >  How to extract text from hyperlink using python?
How to extract text from hyperlink using python?

Time:08-18

Right now i'm trying to validate it whether it is broken or not and to update in the excel sheet. for that i need to get that text from the hyperlink, so that link for easy understandable.

They will provide all the links in an excel sheet.

Link
https://www.dailythanthi.com/Careers 
https://www.dailythanthi.com/Paper-Ad-Tariff

I need to parse this content like this to excel sheet

Link                               text                           response code
https://www.dailyt...        Careers                                   200
https://www.dailyt...        Paper Advertisement                       404

is it possible to extract text link from the link they provided?

CodePudding user response:

If you want to get text from the Google spreadsheet directly, you can use the package google_spreadsheet, otherway you can download it, convert it in csv and manipulate it with the csv library.

In order to get the response code, you can use:

import requests
response = requests.get('website_url')
print(str(response)[11:14])

After, to get the title:

from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
for title in soup.find_all('title'):
    print(title.get_text())

CodePudding user response:

Link is a string so you can use string functions in Python to work with it.

For example you can use .split('/')

links = [
  "https://www.dailythanthi.com/Careers",
  "https://www.dailythanthi.com/Paper-Ad-Tariff",
]

for item in links:
    parts = item.split("/") 
    print( parts[3] )

Result:

Careers
Paper-Ad-Tariff

But for more complex tasks you could use standard module urllib.parse

links = [
  "https://www.dailythanthi.com/Careers",
  "https://www.dailythanthi.com/Paper-Ad-Tariff",
]

import urllib.parse

for item in links:
    parts = urllib.parse.urlparse(item)
    print( parts.path[1:] )
  • Related