I have a Discord bot written in Python and I wanted to add a feature that would make it immediately delete any phishing links it finds.
I looked for a list of known phishing domains and I found this on GitHub.
However the issue is that this is a JS file with one big array, and my bot is 100% Python.
I could just make a copy of this list, but then I lose the advantage of it being constantly updated, so I would like to read the domains directly from GitHub, if possible.
I am not sure how to get and parse this into a Python list.
Looking around on StackOverflow people are suggesting parsing the data as JSON, or using regex, but unfortunately I haven't understood it all yet.
Guidance would help - or maybe you have a better way of doing things altogether rather than this approach! Thank you
CodePudding user response:
Here is one approach (prone to failure and definitely not the recommended way to do this):
import requests
RAW_DATA_LINK = "https://raw.githubusercontent.com/nikolaischunk/discord-phishing-links/main/domain-list.js"
def get_data():
response = requests.get(RAW_DATA_LINK)
data = response.content.decode()
data = data.replace("const suspiciousDomains = ", "").replace(";", "") # or just data[26:-2]
return eval(data)
get_data()
will give you a list of all the links in that file.
You could additionally try using sessions while making the request...
Again if you are in control of that file just store it as json and if you are not in control, you'd probably be better off with regular expressions.