I have a csv file to be read by pandas, and it has the form as following:
name, quart2c, p_rat, other_col
avg, 1, 2, 3
std, 1, 2, 3
I want to pandas.read_csv()
guarantee that all cells have the type of float32, except the first column('name') because that is the index column.
Hence I pass two args to it like this:
pandas.read_csv(file_path, index_col=0, dtype=np.float32)
# or like this, both failed
pandas.read_csv(file_path, index_col='name', dtype=np.float32)
But pandas still tries to convert the first column to float, and raises a exception:
ValueError: could not convert string to float: 'avg'
What I want:
- The csv file is made by another program coded by myself. If the structure is wrong, I can adjust it easily.
- I want to always specify the arg
dtype=np.float32
, so as to check whether is there any error values. I don't want the values be interpreted to integer type also. - The index column "name" should be reserved as
index_col
, since it will be used later. This column should NOT be cut off anyway.
How should I get it? Thanks!
CodePudding user response:
you can try this way with dtype
and converters
.
import pandas as pd
df = pd.read_csv('test.csv', dtype = 'float32', converters = {'name': str},index_col='name')
print(df)
Output:
quart2c p_rat other_col
name
avg 1.0 2.0 3.0
std 1.0 2.0 3.0
CodePudding user response:
You can create a dictionary that assigns column indexes to dtypes and set it as dtype
in pd.read_csv
:
dtype = dict(zip(range(4),['str'] ['np.float32' for i in range(3)]))
>>> {0: 'str', 1: 'np.float32', 2: 'np.float32', 3: 'np.float32'}
so:
pandas.read_csv(file_path, index_col=0, dtype=dtype)