Home > front end >  How to remove a specific part of a link?
How to remove a specific part of a link?

Time:02-01

So basically Im making a script that's able to download a bunch of maps from TrackmaniaExchange with a search result. However, to download the map files, I need the actual download link, which the search result doesn't give.

I already know how to download maps. The link is https://trackmania.exchange/maps/download/(map id). However, the href's for the search results is /maps/(map id)/(map name).

What I was thinking of doing is using selenium to go to the site, grab the href for the map, edit the link with re.sub so that itll link to /maps/download/(map id)/, and remove the end of the link with re.sub so there's no map name at the end of it. I dont know how to go about it, though. This is what I have so far in my script:

import requests
import os.path
import os
import selenium.webdriver as webdriver
from selenium.webdriver.firefox.options import Options
import time
import re

def Search():
    link="https://trackmania.exchange/mapsearch2?limit=100" #Trackmania Exchange link, will scrape all 100 results
    checkedlink = re.sub("\s", " ", link) #Replaces spaces with   for track names (this shouldnt happen with authors/tags)
    options = Options() #This is for selenium
    options.binary_location = "C:/Program Files/Mozilla Firefox/firefox.exe"
    driver = webdriver.Firefox(options=options)
    search_box = driver.find_element_by_name("trackname")
    sitelinks = driver.find_element_by_xpath("/html/[div/@id='container'/@data-select2-id='container']/[div/@class='container-inner']/[div/@class='ly-box-open']/[div/@class='box-col-6']/[div/@class='windowv2-panel']/[div/@id='searchResults-container']/div/div/table/tbody/[tr/@class='WindowTableCell2v2 with-hover has-image']/[td/@class='cell-ellipsis']")
    results = []
    name=input("Track Name (if nothing, hit enter)") #Prompts the user to input stuff
    author=input("Track Author (if nothing, hit enter)")
    tags=input("Tags (separate with , if there's multiple, if nothing, hit enter)")
    path=input("Map download directory (do not leave blank, use forward slashes)")
    print("WARNING: Download wget for this script to work.")
    type(name) #These are to make a link to find html with
    type(author)
    type(tags)
    type(path)
    if path == "":
        print("Please put a path next time you start this")
        time.sleep(3)
        os.exit()
    else: #And so begins the if/else hellhole to find out what needs to be added to the link
        if tags == "":
            if name == "":
                if author == "":
                    print("Chief, you cant just enter nothing.  Put something in here next time")
                    time.sleep(3)
                    os.exit()
                else:
                    link = link "&author=" author
            else:
                link = link "&trackname=" name
                if author != "":
                    link = link "&author=" author
        else:
            link = link "&tags=" tags
            if name != "":
                link = link "&trackname=" name
                if author != "":
                    link = link "&author=" author
            else:
                if author != "":
                    link = link "&author=" author
    print("Checking link...")
    checkedlink() #this is to make sure there's no spaces in the link.  tags are separated by ,, but track names are separated by  
    print("Attempting to download...")
    driver.get(link)
    links = sitelinks
    for link in links
        href = link.get_attribute("href")
        browser.close()
        with open("list.txt", "w", encoding="utf-8") as f:
            f.write(href)
            for line in f:
                h = re.findall("\d") #My failed attempt at removing the end of the link
                re.sub("/maps/", "https://trackmania.exchange/maps/download", f)
                re.sub("") #unfinished part cause i was stubbed
    os.system("wget --directory-prefix="path" -i list.txt")

Search()

Their API is listed on the site and after looking over the rules for the site, this is allowed. I also havent really tested the script after making the if/else hellhole, but I can work on that later. All I need help with is removing the map name after the map ID. If you need a proper example, one of the href's on the front page for me is /maps/91677/cloudy-day. Itll be different for every link, so I don't really know what I should do.

CodePudding user response:

If I know the URL format will be /maps/id/some-text and the ID will only include numbers, then I would just simply grab the ID from the link using the bellow regex, and then use an f string to build the URL.

map_id = re.search(r"\d ", url).group(0)
get_map_url = f"https://trackmania.exchange/maps/download/{map_id}"

Play around on regex101 with different URLs you may come across.

  • Related