Problems with missing value in csv reader in python [closed]-CodePudding

I have a problem with csv reader. All values are empty after first call. I have a code like listed below. After I use values open_time_min, open_time_max, I can't them use more than one time. After second call for value, it is empty. Can anybody tell me what I'm do wrong?

csv_data = csv.reader(open(FILE_PATH  ".csv"))
open_time_min = min(csv_data)[0]

csv_data = csv.reader(open(FILE_PATH  ".csv"))
open_time_max = max(csv_data)[0]

CodePudding user response：

csv.reader^{Documentation} creates an iterator over the file handle that gives you one row per iteration. Reading through the file once exhausts the file iterator, so you can't read through it again unless you seek() the file handle to the start.

You shouldn't do it this way though. Instead, if you have enough memory, you should read all the data into a variable once, and then perform your operations on it.

with open(FILE_PATH  ".csv") as file_handle:
    csv_reader = csv.reader(file_handle)
    csv_data = [row for row in csv_reader]
    # or csv_data = list(csv_reader)

Note that each item in csv_data is a list of strings that represents a single row of your csv file. If you don't want it to be a list of strings, you will have to convert it as you read it. For example, if the file is all numeric, you can do:

with open(FILE_PATH  ".csv") as file_handle:
    csv_reader = csv.reader(file_handle)
    header = next(csv_reader)
    csv_data = [[float(elem) for elem in row] for row in csv_reader]

Now, csv_data is a list of lists, and you can operate on it any number of times.

Depending on your application, it might make more sense to convert this list of lists to a numpy array (if it is completely numeric) or read it as a dataframe using pandas.

For example, consider a CSV like so:

open_time,data0,data1
0000,10,3
0010,30,2
1500,32,12
2000,12,1

Using numpy:

import numpy as np

# Convert csv_data list-of-lists to a matrix
csv_matrix = np.array(csv_data[1:], dtype=float64) # Slicing to get rid of header
# Or, read it directly from the file
csv_matrix = np.loadtxt(FILE_PATH   ".csv", skiprows=1) # Skip header

This creates a matrix that looks like:

array([[0.0e 00, 1.0e 01, 3.0e 00],
       [1.0e 01, 3.0e 01, 2.0e 00],
       [1.5e 03, 3.2e 01, 1.2e 01],
       [2.0e 03, 1.2e 01, 1.0e 00]])

And you can get the max and min of the first column like so:

open_time_min = csv_matrix[:, 0].min() # Result: 0
open_time_max = csv_matrix[:, 0].max() # Result: 2000

Using Pandas:

import pandas as pd

df = pd.read_csv(FILE_PATH   ".csv")

creates a dataframe like this:

   open_time  data0  data1
0          0     10      3
1         10     30      2
2       1500     32     12
3       2000     12      1

You can use the open_time column from this:

open_time_min = df["open_time"].min() # Result: 0
open_time_max = df["open_time"].max() # Result: 2000