URL strip "#section" part of url adress-CodePudding

I have 3 urls and I want to strip (get rid of) the part behind the actual url (These urls point to the same page, but to a different section. I want to strip the "section" part of a url):

url1 = "https://python.iamroot.eu/install/index.html#alternate-installation-the-home-scheme"
# wanted output: https://python.iamroot.eu/install/index.html

url2 = "https://python.iamroot.eu/install/index.html#alternate-installation-unix-the-prefix-scheme"
# wanted output: https://python.iamroot.eu/install/index.html

url3 = "https://python.iamroot.eu/install/index.html"
# wanted output: https://python.iamroot.eu/install/index.html

CodePudding user response：

The best way to do this is to use the urllib.parse library which is safer than trying to split it yourself:

from urllib.parse import urlparse

url1 = "https://python.iamroot.eu/install/index.html#alternate-installation-the-home-scheme"

newurl = urlparse(url1)._replace(fragment='')
print newurl.geturl())
# https://python.iamroot.eu/install/index.html

CodePudding user response：

A good way of doing this can be to use regex. It is an amazing tool with widespread application in many languages

import re
url=''#the url you want to strip
stripped_url = re.findall(".*.html", url)[0]# the stripped url

Here you can see the documentation of regex

CodePudding user response：

I think I found solution.

print(url1.split("#")[0])
print(url2.split("#")[0])
print(url3.split("#")[0])

Sorry for wasting your time.