I currently have this working to where it outputs any changes made to the sitemap.
What I want is for it to show if the change was removed or added. I figured I could do this by somehow finding out which list the output came from.
i.e If was the old list was missing something the new one had I would know it was ADDED If the new list was missing something the old one had I would know that something was REMOVED
I haven't been able to find much about this specifically, I'm still looking into it.
Note: I have already tried difflib. I DO NOT like the output. I currently have a working program using difflib, but the output is a mess. Figured it be easier (Output wise) to just make my own.
My ultimate goal with this is to monitor a sitemap.xml and print any changes and also print whether it was added, removed or an edit.
import requests
from bs4 import BeautifulSoup
import time
from datetime import datetime
import pandas as pd
import csv
# field names
fields = ['Test', 'Test2', 'Test3']
# target URL
url = "https://www.huntermichaelseo.com/testing.xml"
# act like a browser
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
PrevVersion = ""
FirstRun = True
while True:
# download the page
response = requests.get(url, headers=headers)
# parse the downloaded homepage
soup = BeautifulSoup(response.text, "xml")
this = soup.find_all('loc')
if PrevVersion != this:
if FirstRun == True:
PrevVersion = this
FirstRun = False
print ("Start Monitoring " url " " str(datetime.now()))
# remove all scripts and styles
else:
print("Changes detected on MDR at: " str(datetime.now()))
OldPage = set(PrevVersion)
NewPage = set(soup.find_all('loc'))
another = str(OldPage ^ NewPage).split(", ")
s = '\t'.join(str(x2) for x2 in another)
print(s)
with open('GFG', 'w') as f:
# using csv.writer method from CSV package
write = csv.writer(f)
write.writerow(fields)
write.writerows(another)
OldPage = NewPage
#print ('\n'.join(diff))
PrevVersion = this
else:
print( "\nNo Changes to MDR " str(datetime.now()))
time.sleep(5)
continue
CodePudding user response:
You can't tell which set a value came from when you use oldPage ^ newPage
. Use subtraction to get each difference.
added = newPage - oldPage
deleted = oldPage - newPage
Then when you're writing these to the CSV file, you can add a label to each row indicating which set it came from.