I'm reading in a zipped csv file and would like to extract only specific columns without using pandas. My current code only returns a list for the first list comprehension, but not for the following ones. How can I extract multiple columns while using a context manager?
Input file:
col1,col2,col3
1,2,3
a,b,c
My code
import gzip
import csv
import codecs
with gzip.open(r"myfile.csv.gz", "r") as f:
content = csv.reader(codecs.iterdecode(f, "utf-8"))
col_2 = [row[1] for row in content] # Returns [2, "b"]
col_3 = [row[2] for row in content] # Returns []
Expected output:
col_2: [2, "b"]
col_3: [3, "c"]
CodePudding user response:
The issue is not due to the context manager but to the generator that can only be read once.
You can duplicate it using itertools.tee
:
import gzip
import csv
import codecs
with gzip.open(r"myfile.csv.gz", "r") as f:
content = csv.reader(codecs.iterdecode(f, "utf-8"))
from itertools import tee
c1, c2 = tee(content) # from now on, do not use content anymore
col_2 = [row[1] for row in c1]
col_3 = [row[2] for row in c2]
output:
>>> col_2
['col2', '2', 'b']
>>> col_3
['col3', '3', 'c']
using a classical loop
A better method however would be to use a classical loop. This avoids having to loop over the values twice:
with gzip.open(r"myfile.csv.gz", "r") as f:
content = csv.reader(codecs.iterdecode(f, "utf-8"))
col_2 = []
col_3 = []
for row in content:
col_2.append(row[1])
col_3.append(row[2])