Home > database >  Python If/else behaving incorrectly (or I'm just dumb)
Python If/else behaving incorrectly (or I'm just dumb)

Time:08-01

I'm still a beginner, so I'm sure the issue is in some silly thing I did.

Basically, I'm trying to figure out websites that have only one or the two versions of Google Analytics (UA --> Universal analytics, and GA4 --> Google Analytics 4).

The best way to do it in my opinion is to scrape the network requests and differentiate them using the URLs (see the difference in the variables "ga4check" and "uacheck").

Scraping the network requests and parsing it is working fine, but when I check its presence using an if/else statement it doesn't work. It basically returns false to the first if since the output is "Something isn't right..."

Here's my code :

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import time
import json

ga4check = 'google-analytics.com/g/collect?v=2&tid=G-'
uacheck = 'google-analytics.com/collect?v=1&_v='
collectlist = []

if __name__ == "__main__":

    desired_capabilities = DesiredCapabilities.CHROME
    desired_capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
    options = webdriver.ChromeOptions()
    options.add_argument('headless')
    options.add_argument("--ignore-certificate-errors")

    driver = webdriver.Chrome(executable_path=r'C:\Users\dgayg\Desktop\Scripts\GA4 finder\chromedriver.exe',
                            chrome_options=options,
                            desired_capabilities=desired_capabilities)

    driver.get("https://www.measureschool.com/")
    time.sleep(10)
    logs = driver.get_log("performance")

    with open("network_log.json", "w", encoding="utf-8") as f:
        f.write("[")

        for log in logs:
            network_log = json.loads(log["message"])["message"]

            if("Network.response" in network_log["method"]
                    or "Network.request" in network_log["method"]
                    or "Network.webSocket" in network_log["method"]):

                f.write(json.dumps(network_log) ",")
        f.write("{}]")

    print("Quitting Selenium WebDriver")
    driver.quit()

    json_file_path = "network_log.json"
    with open(json_file_path, "r", encoding="utf-8") as f:
        logs = json.loads(f.read())

    for log in logs:
        try:
            url = log["params"]["request"]["url"]

            if "collect?v=" in url:
                collectlist.append(url)
        except Exception as e:
            pass

if any(uacheck in i for i in collectlist):
    if any(ga4check in i for i in collectlist):
        print('There\'s UA and GA4 on this website')
    elif any(ga4check not in i for i in collectlist):
        print('Only UA is present on this website')
else:
    print('Something isn\'t right...') 

Output :

C:\Users\dgayg\Desktop\Scripts\GA4 finder> & C:/Users/dgayg/AppData/Local/Programs/Python/Python39/python.exe "c:/Users/dgayg/Desktop/Scripts/GA4 finder/main.py"
c:\Users\dgayg\Desktop\Scripts\GA4 finder\main.py:18: DeprecationWarning: executable_path has been deprecated, 
please pass in a Service object
  driver = webdriver.Chrome(executable_path=r'C:\Users\dgayg\Desktop\Scripts\GA4 finder\chromedriver.exe',     
c:\Users\dgayg\Desktop\Scripts\GA4 finder\main.py:18: DeprecationWarning: use options instead of chrome_options  driver = webdriver.Chrome(executable_path=r'C:\Users\dgayg\Desktop\Scripts\GA4 finder\chromedriver.exe',     

DevTools listening on ws://127.0.0.1:14224/devtools/browser/ea27a598-5b1d-48e2-bffa-1bf849b826b8
[0731/193526.701:INFO:CONSOLE(0)] "Failed to set referrer policy: The value '' is not one of 'no-referrer', 'no-referrer-when-downgrade', 'origin', 'origin-when-cross-origin', 'same-origin', 'strict-origin', 'strict-origin-when-cross-origin', or 'unsafe-url'. The referrer policy has been left unchanged.", source:  (0)
[0731/193526.880:INFO:CONSOLE(2)] "JQMIGRATE: Migrate is installed, version 3.3.2", source: https://measureschool.com/wp-includes/js/jquery/jquery-migrate.min.js?ver=3.3.2 (2)
Quitting Selenium WebDriver
Something isn't right...

Here's the output of collectlist

['https://region1.google-analytics.com/g/collect?v=2&tid=G-QG5JR71SF7&gtm=2oe7r0&_p=877231823&_z=ccd.v9B&cid=879701179.1659290205&ul=en-us&sr=800x600&_s=1&sid=1659290205&sct=1&seg=0&dl=https://measureschool.com/&dt=MeasureSchool - The Data-Driven Way of Digital Marketing&en=page_view&_fv=1&_nsi=1&_ss=1', 'https://px.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https://measureschool.com/', 'https://www.google-analytics.com/j/collect?v=1&_v=j96&a=877231823&t=pageview&_s=1&dl=https://measureschool.com/&dp=/&ul=en-us&de=UTF-8&dt=MeasureSchool - The Data-Driven Way of Digital Marketing&sd=24-bit&sr=800x600&vp=774x600&je=0&_u=4CDACEABBAAAAC~&jid=253819846&gjid=797933957&cid=879701179.1659290205&tid=UA-58541733-2&_gid=1754033578.1659290206&_r=1&gtm=2wg7r0593KN2&z=952191590', 'https://px.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https://measureschool.com/&liSync=true', 'https://px4.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https://measureschool.com/&liSync=true&e_ipv6=AQJQvwc9MAD7QAAAAYJVZz8nCTxegeWWl3Feqs04Ry8lLAYe4tRStgs5YUf0ek2yseMWT3wlT2oSrFcxugGX91BzO2PCy9w']

I hope that what I'm trying to achieve is clear enough.

Thanks a lot in advance !!

CodePudding user response:

Look at your collectlist:

collectlist = [
    'https://region1.google-analytics.com/g/collect?v=2&tid=G-QG5JR71SF7&gtm=2oe7r0&_p=877231823&_z=ccd.v9B&cid=879701179.1659290205&ul=en-us&sr=800x600&_s=1&sid=1659290205&sct=1&seg=0&dl=https://measureschool.com/&dt=MeasureSchool - The Data-Driven Way of Digital Marketing&en=page_view&_fv=1&_nsi=1&_ss=1',
    'https://px.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https://measureschool.com/',
    'https://www.google-analytics.com/j/collect?v=1&_v=j96&a=877231823&t=pageview&_s=1&dl=https://measureschool.com/&dp=/&ul=en-us&de=UTF-8&dt=MeasureSchool - The Data-Driven Way of Digital Marketing&sd=24-bit&sr=800x600&vp=774x600&je=0&_u=4CDACEABBAAAAC~&jid=253819846&gjid=797933957&cid=879701179.1659290205&tid=UA-58541733-2&_gid=1754033578.1659290206&_r=1&gtm=2wg7r0593KN2&z=952191590',
    'https://px.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https://measureschool.com/&liSync=true',
    'https://px4.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https://measureschool.com/&liSync=true&e_ipv6=AQJQvwc9MAD7QAAAAYJVZz8nCTxegeWWl3Feqs04Ry8lLAYe4tRStgs5YUf0ek2yseMWT3wlT2oSrFcxugGX91BzO2PCy9w'
]

and look at your value for uacheck:

uacheck = 'google-analytics.com/collect?v=1&_v='

There is not any i in collectlist that contains uacheck. You do have a ga4check URL, but your code doesn't bother looking for ga4check if it doesn't find at least one uacheck first.

I believe you may want to structure your checks more like:

any_ua = any(uacheck in i for i in collectlist)
any_ga4 = any(ga4check in i for i in collectlist)

if any_ua and any_ga4:
    print('There\'s UA and GA4 on this website')
elif any_ua:
    print('Only UA is present on this website')
elif any_ga4:
    print('Only GA4 is present on this website')
else:
    print('Neither is present on this website.)

Since your ifs are effectively checking all the possible combinations of two booleans, you could also represent that as a 2x2 truth table, like this:

any_ua = any(uacheck in i for i in collectlist)
any_ga4 = any(ga4check in i for i in collectlist)
print([
    # no GA                # some GA
    ["Neither is present", "Only GA4 is present"],  # no UA
    ["Only UA is present", "There's UA and GA4"],   # some UA
][any_ua][any_ga4], "on this website")
  • Related