Home > Software design >  Looping through two sets of data to conditionally append to CSV file
Looping through two sets of data to conditionally append to CSV file

Time:07-07

I'm writing a script that should: 1) open a CSV file 2) loop through some eBay results data 3) write details from this data to the same file if it matches a search term and if it's not already present, but it has a few issues:

  1. The headers are not written if not present, so I'm manually adding them. I was previously using DictWriter() with writeheader(), but fieldnames was required, so they were written each time
  2. The if result_id != existing_result_id: condition doesn't work in a nested loop, so the rows aren't written. if any(x in result_title.upper() for x in search_terms): does currently work, but I'm not sure how to combine them before writing the rows

import csv
import smtplib
from requests_html import HTMLSession

SEARCH_TERMS = ['one', 'two']

session = HTMLSession()
response = session.get('https://www.ebay.com/sch/i.html?_from=R40&_nkw=blink 182&_sacat=0&_sop=10&rt=nc&LH_PrefLoc=2')
results = response.html.find('ul.srp-results li.s-item')
    
with open('ebay-results.csv', 'r') as csv_file:
  csv_reader = csv.reader(csv_file)
  next(csv_reader) # Skip headers

  with open('ebay-results.csv', 'a') as csv_file_updated:
    # field_names = ['Title', 'Link', 'ID']
    csv_writer = csv.writer(csv_file_updated)
    search_terms = [x.upper() for x in SEARCH_TERMS]

    for result in results:
      result_title = result.find('h3.s-item__title', first=True).text.replace('New Listing', '')
      result_link = result.find('a.s-item__link', first=True).attrs['href'].split('?')[0]
      result_id = result_link.split('/')[4].split('?')[0]
      result_info = [result_title, result_link, result_id]

      if any(x in result_title.upper() for x in search_terms):
        for line in csv_reader:
          existing_result_id = line[2]

          if result_id != existing_result_id:
            csv_writer.writerow(result_info)

            send_email(search_terms, result_link)

CodePudding user response:

There's a few issues with your code:

  • you create a file handle to read a file and then another file handle to append to the same file; that's dodgy at best, do you expect the reader to read lines you've appended? What's the purpose?
  • you exhaust the reader with for line in csv_reader: for every result, which will only work once, since you don't seek the start of the file before rereading; however, why reread the file over and over anyway?
  • standard Python indentation is 4 deep, you would do well to follow convention, as not doing so just makes your code harder to read, maintain and reuse.

It appears you simply want to write results for identifiers you haven't written results for previously to the file. Why not do this:

  • read the results once, keeping the identifiers for a quick lookup
  • then loop over the results, writing results with new identifiers

This is likely what you were after:

import csv
from requests_html import HTMLSession

SEARCH_TERMS = ['one', 'two']

session = HTMLSession()
response = session.get('https://www.ebay.com/sch/i.html?_from=R40&_nkw=blink 182&_sacat=0&_sop=10&rt=nc&LH_PrefLoc=2')
results = response.html.find('ul.srp-results li.s-item')

with open('ebay-results.csv', 'r ', newline='') as csv_file:
    csv_reader = csv.reader(csv_file)
    next(csv_reader)  # Skip headers
    existing_ids = [rec[2] for rec in csv_reader]

    # field_names = ['Title', 'Link', 'ID']
    csv_writer = csv.writer(csv_file)
    search_terms = [x.upper() for x in SEARCH_TERMS]

    for result in results:
        result_title = result.find('h3.s-item__title', first=True).text.replace('New Listing', '')
        result_link = result.find('a.s-item__link', first=True).attrs['href'].split('?')[0]
        result_id = result_link.split('/')[4].split('?')[0]
        result_info = [result_title, result_link, result_id]

        if any(x in result_title.upper() for x in search_terms):
            if result_id not in existing_ids:
                csv_writer.writerow(result_info)

                # not implemented, but not relevant to the question
                # send_email(search_terms, result_link) 

You asked how to deal with the headers, another issue might be that the .csv doesn't exist when first running. Something like this would solve both:

from pathlib import Path

# create file if it doesn't exist, write header to it
if not Path('ebay-results.csv').is_file():
    with open('ebay-results.csv', 'wb') as csv_file:
        csv_file.write('Title,Link,ID\n')

# reopen the file for reading, which now must exist, and has a header
with open('ebay-results.csv', 'r ', newline='') as csv_file:
    csv_reader = csv.reader(csv_file)
    field_names = next(csv_reader)
  • Related