I have a CSV file with lines as follows:
"Dec 30, 2021","1,234.11","1,654.22","11,876.23","1,676,234"
I have learn from a previous post that I can use:
parse_dates=['Date']
To get the date parsed (that works). However I would like columns 2-4 as np.float64 and column 5 as int64. How can I achieve that?
I have tried this:
data = pd.read_csv("file.csv", parse_dates=['Date'], dtype=[np.datetime64, np.float64, np.float64, np.float64, np.float64, np.int64])
but I get
TypeError: data type not understood
CodePudding user response:
Use thousands
parameter.
df = pd.read_csv("file.csv", parse_dates=['Date'], thousands=',')
CodePudding user response:
Use converters
parameter if you have special format.
converters = {
'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
'Number': lambda x: float(x.replace(',', ''))
}
df = pd.read_csv('data.csv', converters=converters)
Output:
>>> df
Date Number
0 2021-12-30 2345.55
>>> df.dtypes
Date datetime64[ns]
Number float64
dtype: object
# data.csv
Date,Number
"Dec 30, 2021","2,345.55"
Else use standard parameters:
df = pd.read_csv("data.csv", header=None, parse_dates=[0], thousands=',', quoting=1)
Output:
>>> df
0 1 2 3 4
0 2021-12-30 1234.11 1654.22 11876.23 1676234
>>> df.dtypes
0 datetime64[ns]
1 float64
2 float64
3 float64
4 int64
dtype: object