Email address is generated passively and selector cannot locate this even there is no api request. I have been doing web scraping for over two years now and this issue comes up every now and then, mostly i solve this by loading a page with selenium and parsing the response but this time i have to stick to scrapy(no splash) only.
I have noticed that once i fetch the link in scrapy shell
and view response i can see email but selector cannot locate the email.
So far i have been able to locate the javascript request generating the email but i cannot figure out how to reverse engineer it with scrapy.
Here is the link to one of websites of similar examples. Any help is appreciated
You can see email's obfuscated value. Now it's up to you to figure out how to deobfuscated. Generally you'd go through websites javascript code and find how email is obfuscated and reverse engineer that. This is a bit of out of scope of this question but obfuscation code in this cases does:
- replace
@
with//
- replace
.
with/
- reverses the whole string
So if we can reverse that in python:
value = "ku/ca/retsacnal//eromhsa/l"
value = value[::-1]
value = value.replace(r"//", "@").replace(r"/", ".")
which turns into exactly what we see on the website: [email protected]
!