Home > Back-end >  How do I delete multiple rows based on the column value without Pandas
How do I delete multiple rows based on the column value without Pandas

Time:03-20

I have a large CSV file (thousands of data points).

A sample portion of my csv file

The goal is to create a separate file containing only the rows from the original CSV where type=1 (this is column 2). I have attached the code that I have, It's creating a completly blankfile which is not what I need need. I feel like I am close but I am missing something.

import csv
typedata = '2'
with open('d:/Python/testfile.csv') as in_file, open('d:/Python/Data.csv', 'w') as out_file:
reader = csv.reader(in_file)
writer = csv.writer(out_file)
for row in reader:
    if row in reader:
        if Typedata not in row: #hopefully excludes this
            writer.writerow(row)

The answers I found all uses Pandas which is something I can't use for the assignment. Anything helps

CodePudding user response:

Well, I'd like to use just str.split() as I'm not familiar with csv module. Please check this code.

f = open('test.csv')
d = f.read()
f.close()
n = '\n'.join([x for x in d.splitlines() if x.split(',')[1] == '2'])

You can save n with a new file opened with 'w' option.

CodePudding user response:

Problem is if row in reader which has to get all data from file to check if row is in reader. And later it has nothing to read, and nothing to write

I use in_file = text.splitlines() only to test it. You should use open().

import csv

text = '''FKEY,TYPE,EFF,DATA
34787,2,Y,2022.03.20,1088825
14787,1,Y,2022.03.20,1088825
34787,2,Y,2022.03.20,1088825
14787,1,Y,2022.03.20,1088825
14787,1,Y,2022.03.20,1088825
34787,2,Y,2022.03.20,1088825
14787,1,Y,2022.03.20,1088825
14787,1,Y,2022.03.20,1088825'''

typedata = '2'

#with open('d:/Python/testfile.csv') as in_file, open('d:/Python/Data.csv', 'w') as out_file:
in_file = text.splitlines()
reader = csv.reader(in_file)
#writer = csv.writer(out_file)
for row in reader:
    if typedata not in row:
        print(row)
        #writer.writerow(row)

Result:

['FKEY', 'TYPE', 'EFF', 'DATA']
['14787', '1', 'Y', '2022.03.20', '1088825']
['14787', '1', 'Y', '2022.03.20', '1088825']
['14787', '1', 'Y', '2022.03.20', '1088825']
['14787', '1', 'Y', '2022.03.20', '1088825']
['14787', '1', 'Y', '2022.03.20', '1088825']
  • Related