I have a large CSV file (thousands of data points).
A sample portion of my csv file
The goal is to create a separate file containing only the rows from the original CSV where type=1 (this is column 2). I have attached the code that I have, It's creating a completly blankfile which is not what I need need. I feel like I am close but I am missing something.
import csv
typedata = '2'
with open('d:/Python/testfile.csv') as in_file, open('d:/Python/Data.csv', 'w') as out_file:
reader = csv.reader(in_file)
writer = csv.writer(out_file)
for row in reader:
if row in reader:
if Typedata not in row: #hopefully excludes this
writer.writerow(row)
The answers I found all uses Pandas which is something I can't use for the assignment. Anything helps
CodePudding user response:
Well, I'd like to use just str.split() as I'm not familiar with csv module. Please check this code.
f = open('test.csv')
d = f.read()
f.close()
n = '\n'.join([x for x in d.splitlines() if x.split(',')[1] == '2'])
You can save n with a new file opened with 'w' option.
CodePudding user response:
Problem is if row in reader
which has to get all data from file to check if row
is in reader
. And later it has nothing to read, and nothing to write
I use in_file = text.splitlines()
only to test it. You should use open()
.
import csv
text = '''FKEY,TYPE,EFF,DATA
34787,2,Y,2022.03.20,1088825
14787,1,Y,2022.03.20,1088825
34787,2,Y,2022.03.20,1088825
14787,1,Y,2022.03.20,1088825
14787,1,Y,2022.03.20,1088825
34787,2,Y,2022.03.20,1088825
14787,1,Y,2022.03.20,1088825
14787,1,Y,2022.03.20,1088825'''
typedata = '2'
#with open('d:/Python/testfile.csv') as in_file, open('d:/Python/Data.csv', 'w') as out_file:
in_file = text.splitlines()
reader = csv.reader(in_file)
#writer = csv.writer(out_file)
for row in reader:
if typedata not in row:
print(row)
#writer.writerow(row)
Result:
['FKEY', 'TYPE', 'EFF', 'DATA']
['14787', '1', 'Y', '2022.03.20', '1088825']
['14787', '1', 'Y', '2022.03.20', '1088825']
['14787', '1', 'Y', '2022.03.20', '1088825']
['14787', '1', 'Y', '2022.03.20', '1088825']
['14787', '1', 'Y', '2022.03.20', '1088825']