Home > front end >  How to prevent "pandas.read_csv" convert index column to float with arg 'dtype=np.flo
How to prevent "pandas.read_csv" convert index column to float with arg 'dtype=np.flo

Time:10-03

I have a csv file to be read by pandas, and it has the form as following:

name,   quart2c,    p_rat,  other_col
avg,    1,          2,      3
std,    1,          2,      3

I want to pandas.read_csv() guarantee that all cells have the type of float32, except the first column('name') because that is the index column.

Hence I pass two args to it like this:

pandas.read_csv(file_path, index_col=0, dtype=np.float32)

# or like this, both failed
pandas.read_csv(file_path, index_col='name', dtype=np.float32)

But pandas still tries to convert the first column to float, and raises a exception:

ValueError: could not convert string to float: 'avg'

What I want:

  1. The csv file is made by another program coded by myself. If the structure is wrong, I can adjust it easily.
  2. I want to always specify the arg dtype=np.float32, so as to check whether is there any error values. I don't want the values be interpreted to integer type also.
  3. The index column "name" should be reserved as index_col, since it will be used later. This column should NOT be cut off anyway.

How should I get it? Thanks!

CodePudding user response:

you can try this way with dtype and converters.

import pandas as pd
df = pd.read_csv('test.csv', dtype = 'float32', converters = {'name': str},index_col='name')  
print(df)

Output:

         quart2c      p_rat    other_col
name                                    
avg          1.0        2.0          3.0
std          1.0        2.0          3.0

CodePudding user response:

You can create a dictionary that assigns column indexes to dtypes and set it as dtype in pd.read_csv:

dtype = dict(zip(range(4),['str']   ['np.float32' for i in range(3)]))
>>> {0: 'str', 1: 'np.float32', 2: 'np.float32', 3: 'np.float32'}

so:

pandas.read_csv(file_path, index_col=0, dtype=dtype)
  • Related