So basically Im making a script that's able to download a bunch of maps from TrackmaniaExchange with a search result. However, to download the map files, I need the actual download link, which the search result doesn't give.
I already know how to download maps. The link is https://trackmania.exchange/maps/download/(map id). However, the href's for the search results is /maps/(map id)/(map name).
What I was thinking of doing is using selenium to go to the site, grab the href for the map, edit the link with re.sub so that itll link to /maps/download/(map id)/, and remove the end of the link with re.sub so there's no map name at the end of it. I dont know how to go about it, though. This is what I have so far in my script:
import requests
import os.path
import os
import selenium.webdriver as webdriver
from selenium.webdriver.firefox.options import Options
import time
import re
def Search():
link="https://trackmania.exchange/mapsearch2?limit=100" #Trackmania Exchange link, will scrape all 100 results
checkedlink = re.sub("\s", " ", link) #Replaces spaces with for track names (this shouldnt happen with authors/tags)
options = Options() #This is for selenium
options.binary_location = "C:/Program Files/Mozilla Firefox/firefox.exe"
driver = webdriver.Firefox(options=options)
search_box = driver.find_element_by_name("trackname")
sitelinks = driver.find_element_by_xpath("/html/[div/@id='container'/@data-select2-id='container']/[div/@class='container-inner']/[div/@class='ly-box-open']/[div/@class='box-col-6']/[div/@class='windowv2-panel']/[div/@id='searchResults-container']/div/div/table/tbody/[tr/@class='WindowTableCell2v2 with-hover has-image']/[td/@class='cell-ellipsis']")
results = []
name=input("Track Name (if nothing, hit enter)") #Prompts the user to input stuff
author=input("Track Author (if nothing, hit enter)")
tags=input("Tags (separate with , if there's multiple, if nothing, hit enter)")
path=input("Map download directory (do not leave blank, use forward slashes)")
print("WARNING: Download wget for this script to work.")
type(name) #These are to make a link to find html with
type(author)
type(tags)
type(path)
if path == "":
print("Please put a path next time you start this")
time.sleep(3)
os.exit()
else: #And so begins the if/else hellhole to find out what needs to be added to the link
if tags == "":
if name == "":
if author == "":
print("Chief, you cant just enter nothing. Put something in here next time")
time.sleep(3)
os.exit()
else:
link = link "&author=" author
else:
link = link "&trackname=" name
if author != "":
link = link "&author=" author
else:
link = link "&tags=" tags
if name != "":
link = link "&trackname=" name
if author != "":
link = link "&author=" author
else:
if author != "":
link = link "&author=" author
print("Checking link...")
checkedlink() #this is to make sure there's no spaces in the link. tags are separated by ,, but track names are separated by
print("Attempting to download...")
driver.get(link)
links = sitelinks
for link in links
href = link.get_attribute("href")
browser.close()
with open("list.txt", "w", encoding="utf-8") as f:
f.write(href)
for line in f:
h = re.findall("\d") #My failed attempt at removing the end of the link
re.sub("/maps/", "https://trackmania.exchange/maps/download", f)
re.sub("") #unfinished part cause i was stubbed
os.system("wget --directory-prefix="path" -i list.txt")
Search()
Their API is listed on the site and after looking over the rules for the site, this is allowed. I also havent really tested the script after making the if/else hellhole, but I can work on that later. All I need help with is removing the map name after the map ID. If you need a proper example, one of the href's on the front page for me is /maps/91677/cloudy-day. Itll be different for every link, so I don't really know what I should do.
CodePudding user response:
If I know the URL format will be /maps/id/some-text
and the ID will only include numbers, then I would just simply grab the ID from the link using the bellow regex, and then use an f string to build the URL.
map_id = re.search(r"\d ", url).group(0)
get_map_url = f"https://trackmania.exchange/maps/download/{map_id}"
Play around on regex101 with different URLs you may come across.