Right now i'm trying to validate it whether it is broken or not and to update in the excel sheet. for that i need to get that text from the hyperlink, so that link for easy understandable.
They will provide all the links in an excel sheet.
Link
https://www.dailythanthi.com/Careers
https://www.dailythanthi.com/Paper-Ad-Tariff
I need to parse this content like this to excel sheet
Link text response code
https://www.dailyt... Careers 200
https://www.dailyt... Paper Advertisement 404
is it possible to extract text link from the link they provided?
CodePudding user response:
If you want to get text from the Google spreadsheet directly, you can use the package google_spreadsheet
, otherway you can download it, convert it in csv and manipulate it with the csv library.
In order to get the response code, you can use:
import requests
response = requests.get('website_url')
print(str(response)[11:14])
After, to get the title:
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
for title in soup.find_all('title'):
print(title.get_text())
CodePudding user response:
Link is a string
so you can use string functions
in Python to work with it.
For example you can use .split('/')
links = [
"https://www.dailythanthi.com/Careers",
"https://www.dailythanthi.com/Paper-Ad-Tariff",
]
for item in links:
parts = item.split("/")
print( parts[3] )
Result:
Careers
Paper-Ad-Tariff
But for more complex tasks you could use standard module urllib.parse
links = [
"https://www.dailythanthi.com/Careers",
"https://www.dailythanthi.com/Paper-Ad-Tariff",
]
import urllib.parse
for item in links:
parts = urllib.parse.urlparse(item)
print( parts.path[1:] )