Home > Mobile >  Is there a way to tell which set a list item came from?
Is there a way to tell which set a list item came from?

Time:07-12

I currently have this working to where it outputs any changes made to the sitemap.

What I want is for it to show if the change was removed or added. I figured I could do this by somehow finding out which list the output came from.

i.e If was the old list was missing something the new one had I would know it was ADDED If the new list was missing something the old one had I would know that something was REMOVED

I haven't been able to find much about this specifically, I'm still looking into it.

Note: I have already tried difflib. I DO NOT like the output. I currently have a working program using difflib, but the output is a mess. Figured it be easier (Output wise) to just make my own.

My ultimate goal with this is to monitor a sitemap.xml and print any changes and also print whether it was added, removed or an edit.

import requests
from bs4 import BeautifulSoup
import time
from datetime import datetime
import pandas as pd
import csv
  
# field names 
fields = ['Test', 'Test2', 'Test3'] 
    

# target URL
url = "https://www.huntermichaelseo.com/testing.xml"

# act like a browser
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

PrevVersion = ""
FirstRun = True


while True:
   # download the page
  response = requests.get(url, headers=headers)
  # parse the downloaded homepage
  soup = BeautifulSoup(response.text, "xml")
  this = soup.find_all('loc')
  if PrevVersion != this:
    if FirstRun == True:
      PrevVersion = this
      FirstRun = False
      print ("Start Monitoring " url  " "  str(datetime.now()))


      # remove all scripts and styles
      
    else:
      print("Changes detected on MDR at: "  str(datetime.now()))

      OldPage = set(PrevVersion)
      NewPage = set(soup.find_all('loc'))
      another = str(OldPage ^ NewPage).split(", ")
      s = '\t'.join(str(x2) for x2 in another)   
      print(s)
      with open('GFG', 'w') as f:
      
        # using csv.writer method from CSV package
        write = csv.writer(f)
          
        write.writerow(fields)
        write.writerows(another)
        OldPage = NewPage
      #print ('\n'.join(diff))
      PrevVersion = this
      
  else:
    print( "\nNo Changes to MDR "  str(datetime.now()))
  time.sleep(5)
  continue  

CodePudding user response:

You can't tell which set a value came from when you use oldPage ^ newPage. Use subtraction to get each difference.

added = newPage - oldPage
deleted = oldPage - newPage

Then when you're writing these to the CSV file, you can add a label to each row indicating which set it came from.

  • Related