I have a productionized Django backend server running as on Kubernetes (Deployment/Service/Ingress) on GCP. My django is configured with something like
ALLOWED_HOSTS = [BACKEND_URL,INGRESS_IP,THIS_POD_IP,HOST_IP]
Everything is working as expected.
However, my backend server logs intermittent errors like these (about 7 per day)
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'xxnet-f23.appspot.com'. You may need to add 'xxnet-f23.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'xxnet-301.appspot.com'. You may need to add 'xxnet-301.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'narutobm1234.appspot.com'. You may need to add 'narutobm1234.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'z-h-e-n-116.appspot.com'. You may need to add 'z-h-e-n-116.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'xxnet-131318.appspot.com'. You may need to add 'xxnet-131318.appspot.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'www.google.com'. You may need to add 'www.google.com' to ALLOWED_HOSTS.
DisallowedHost: Invalid HTTP_HOST header: 'stoked-dominion-123514.appspot.com'. You may need to add 'stoked-dominion-123514.appspot.com' to ALLOWED_HOSTS.
My primary question is: Why - what are all of these hosts?.
I certainly don't want to allow those hosts without understanding their purpose.
Bonus question: What's the best way to silence unwanted hosts within my techstack?
CodePudding user response:
Django checks any request with a header against the ALLOWED_HOSTS setting. When it’s not there, Django throws the Invalid HTTP_HOST header error. See documentation.
These HTTP requests could be coming from bots with wrong host header value. You may want to consider checking Cloud Armor to block traffic from specific host header/domain.
CodePudding user response:
My primary question is: Why - what are all of these hosts?.
Some of them are web crawlers that gather information for various purposes. For example, the www.google.com
address is most likely the web crawlers that populate the search engine databases for Google search, etcetera.
Google probably got to your back-end site by accident by following a chain of links from some other page that is searchable; e.g. your front end website. You could try to identify that path. I believe there is also a page where you can request the removal of URLs from search ... though I'm not sure how effective that would be in quieting your logs.
Others may be people probing your site for vulnerabilities.
I certainly don't want to allow those hosts without understanding their purpose.
Well, you can never entirely know their purpose. And in some cases, you may never be able to find out.
Bonus question: What's the best way to silence unwanted hosts within my techstack?
One way is to simply block access using a manually managed blacklist or whitelist.
A second way is to have your back-end publish a "/robots.txt" document; see About /robots.txt. Note that not all crawlers will respect a "robots.txt" page, but the reputable ones will; see How Google interprets the robots.txt specification.
Note that it is easy to craft a "/robots.txt" that says "nobody crawl this site".
Other ways would include putting your backend server behind a firewall or giving it a private IP address. (It seems a bit of an odd decision to expose your back-end services to the internet.)
Finally, the sites you are seeing are already being blocked, and Django is telling you that. Perhaps what you should be asking is how to mute the log messages for these events.