I have a problem with csv reader. All values are empty after first call.
I have a code like listed below.
After I use values open_time_min
, open_time_max
, I can't them use more than one time.
After second call for value, it is empty.
Can anybody tell me what I'm do wrong?
csv_data = csv.reader(open(FILE_PATH ".csv"))
open_time_min = min(csv_data)[0]
csv_data = csv.reader(open(FILE_PATH ".csv"))
open_time_max = max(csv_data)[0]
CodePudding user response:
csv.reader
Documentation creates an iterator over the file handle that gives you one row per iteration. Reading through the file once exhausts the file iterator, so you can't read through it again unless you seek()
the file handle to the start.
You shouldn't do it this way though. Instead, if you have enough memory, you should read all the data into a variable once, and then perform your operations on it.
with open(FILE_PATH ".csv") as file_handle:
csv_reader = csv.reader(file_handle)
csv_data = [row for row in csv_reader]
# or csv_data = list(csv_reader)
Note that each item in csv_data
is a list of strings that represents a single row of your csv file. If you don't want it to be a list of strings, you will have to convert it as you read it. For example, if the file is all numeric, you can do:
with open(FILE_PATH ".csv") as file_handle:
csv_reader = csv.reader(file_handle)
header = next(csv_reader)
csv_data = [[float(elem) for elem in row] for row in csv_reader]
Now, csv_data
is a list of lists, and you can operate on it any number of times.
Depending on your application, it might make more sense to convert this list of lists to a numpy array (if it is completely numeric) or read it as a dataframe using pandas
.
For example, consider a CSV like so:
open_time,data0,data1
0000,10,3
0010,30,2
1500,32,12
2000,12,1
Using numpy:
import numpy as np
# Convert csv_data list-of-lists to a matrix
csv_matrix = np.array(csv_data[1:], dtype=float64) # Slicing to get rid of header
# Or, read it directly from the file
csv_matrix = np.loadtxt(FILE_PATH ".csv", skiprows=1) # Skip header
This creates a matrix that looks like:
array([[0.0e 00, 1.0e 01, 3.0e 00],
[1.0e 01, 3.0e 01, 2.0e 00],
[1.5e 03, 3.2e 01, 1.2e 01],
[2.0e 03, 1.2e 01, 1.0e 00]])
And you can get the max and min of the first column like so:
open_time_min = csv_matrix[:, 0].min() # Result: 0
open_time_max = csv_matrix[:, 0].max() # Result: 2000
Using Pandas:
import pandas as pd
df = pd.read_csv(FILE_PATH ".csv")
creates a dataframe like this:
open_time data0 data1
0 0 10 3
1 10 30 2
2 1500 32 12
3 2000 12 1
You can use the open_time
column from this:
open_time_min = df["open_time"].min() # Result: 0
open_time_max = df["open_time"].max() # Result: 2000