Drop requests that include query string in Scrapy-CodePudding

I am new to scrapy and I've come across a complicated case.

My problem is that sometimes I have links like https://sitename.com/path2/?param1=value1&param2=value2 and for me, query string is not important and I want to Drop it from requests.
I mean this part of the url: ?param1=value1&param2=value2

After a day of research, I realized that this should be done in the middlewares.py file (Downloader Middleware) (Source). Because requests and receipts in Scrapy go through this path.
I tried to write a code so that the requests and answers are without query string, but I did not succeed.
My code does not drop requests that include query string.
middlewares.py:

from w3lib.url import url_query_cleaner

class CleanUrlAgentDownloaderMiddleware:

    def process_response(self, request, response, spider):
        url_query_cleaner(response.url)
        return response

    def process_request(self, request, spider):
        url_query_cleaner(request.url)

How can I release these requests using the w3lib.url library or using Python codes? And don't enter Scrapy?
Just to let you know that I set my class in the settings.py

CodePudding user response：

Since strings are immutable, your code will not change the anything in the requests. for your code to work you have to do

from w3lib.url import url_query_cleaner

class CleanUrlAgentDownloaderMiddleware:
    # No need for process response since it will have the same 
    # url as the request

    def process_request(self, request, spider):
        if "?" in request.url:
            return request.replace(url=url_query_cleaner(request.url))

alternately, if you want to ignore requests that have queries in their url you can do

from scrapy.exceptions import IgnoreRequest
from urllib.parse import urlparse

class IgnoreQueryRequestMiddleware:
    def process_request(self, request, spider):
        if urlparse(request.url).query:
            raise IgnoreRequest