Home > OS >  Removing lines in CSV file is adding extra lines
Removing lines in CSV file is adding extra lines

Time:11-25

I am working on a coding assignment where one of the requirements of the app is to be able to remove lines of interest in the CSV file. When I try to remove the line that is identified by the key (name), it not only removes the line but also adds multiple copies of my first line to my CSV file. I can't seem to figure out why it is adding these repetitive lines. Any help is appreciated.

For reference: attractions is a list of dictionaries that the csv file was copied into

The delete function is below

name = entername()

with open('boston.csv', 'r') as csv_read:
    reader = csv.reader(csv_read)
    for row in reader:
        attractions.append(row)
        for field in row:
            if field == name:
               attractions.remove(row)

with open('boston.csv', 'w') as csv_write:
    writer = csv.writer(csv_write)
    writer.writerows(attractions)

and my CSV file before looks like this:

Short Name,Name,Category,URL,Lat,Lon,Color
harvard,Harvard University,university,https://www.harvard.edu/,42.373032,-71.116661,green
mit,Massachusetts Institute of Technology,University,https://www.mit.edu/,42.360092,-71.094162,green
science,Museum of Science,Tourism,https://www.mos.org/,42.36932,-71.07151,green
children,Boston Children's Museum,Tourism,https://bostonchildrensmuseum.org/,42.3531,-71.04998,green

but results in this:

Short Name,Name,Category,URL,Lat,Lon,Color
Short Name,Name,Category,URL,Lat,Lon,Color
Short Name,Name,Category,URL,Lat,Lon,Color
Short Name,Name,Category,URL,Lat,Lon,Color
Short Name,Name,Category,URL,Lat,Lon,Color
harvard,Harvard University,university,https://www.harvard.edu/,42.373032,-71.116661,green
science,Museum of Science,Tourism,https://www.mos.org/,42.36932,-71.07151,green
children,Boston Children's Museum,Tourism,https://bostonchildrensmuseum.org/,42.3531,-71.04998,green

CodePudding user response:

I've run your code and it appears to work.

I modified it to:

  • hard-code the name (for debugging)
  • print a message when a row is removed
  • not overwrite the input file (VERY Helpful when debugging)
import csv

name = 'Harvard University'

attractions = []
with open('boston.csv', 'r') as csv_read:
    reader = csv.reader(csv_read)
    for row in reader:
        attractions.append(row)
        for field in row:
            if field == name:
                print(f'{field} matches {name}, removing {row}')
                attractions.remove(row)

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(attractions)

when I run that, I see this debug-print message:

Harvard University matches Harvard University, removing ['harvard', 'Harvard University', 'university', 'https://www.harvard.edu/', '42.373032', '-71.116661', 'green']

and this is my output.csv:

Short Name,Name,Category,URL,Lat,Lon,Color
mit,Massachusetts Institute of Technology,University,https://www.mit.edu/,42.360092,-71.094162,green
science,Museum of Science,Tourism,https://www.mos.org/,42.36932,-71.07151,green
children,Boston Children's Museum,Tourism,https://bostonchildrensmuseum.org/,42.3531,-71.04998,green

When I change name to name = 'Tourism', which is valid with your logic (even if it isn't what you want/intend), it still does what you'd expect, remove the two rows where Tourism is in the Category field:

...
name = 'Tourism'

attractions = []
...
Tourism matches Tourism, removing ['science', 'Museum of Science', 'Tourism', 'https://www.mos.org/', '42.36932', '-71.07151', 'green']
Tourism matches Tourism, removing ['children', "Boston Children's Museum", 'Tourism', 'https://bostonchildrensmuseum.org/', '42.3531', '-71.04998', 'green']
Short Name,Name,Category,URL,Lat,Lon,Color
harvard,Harvard University,university,https://www.harvard.edu/,42.373032,-71.116661,green
mit,Massachusetts Institute of Technology,University,https://www.mit.edu/,42.360092,-71.094162,green

All that said, I recommend not adding-then-removing when a certain condition is met. Instead, I favor adding if the skip condition isn't met:

for row in reader:
    skip_row = False
    for field in row:
        if field == name:
            print(f'{field} matches {name}, skipping {row}')
            skip_row = True
            break  # stop searching fields

    if not skip_row:
        attractions.append(row)

And, if you only care about the Name field, this can be shortened and made even more straight-forward:

name_idx = 1  # fields are 0-based, so your 2nd field is index 1
for row in reader:
    if row[name_idx] == name:
        print(f'Found {name}, skipping {row}')
        continue  # skip rest of this loop (the append), start with next row

    attractions.append(row)

CodePudding user response:

There's a pure python convtools library which generates the code under the hood and provides lots of data processing primitives:

from convtools import conversion as c
from convtools.contrib.tables import Table

name = entername()

table = Table.from_csv("boston.csv")  # pass header=True if it's there
columns = table.columns
table.filter(
    c.not_(
        c.or_(*(c.col(column_name) == name for column_name in columns))
        if len(columns) > 1
        else c.col(columns[0]) == name
    )
).into_csv("boston_output.csv")

  • Related