Home > database >  How do I parse numbers with thousands separator in pandas read_csv?
How do I parse numbers with thousands separator in pandas read_csv?

Time:01-04

I have a CSV file with lines as follows:

"Dec 30, 2021","1,234.11","1,654.22","11,876.23","1,676,234"

I have learn from a previous post that I can use:

parse_dates=['Date']

To get the date parsed (that works). However I would like columns 2-4 as np.float64 and column 5 as int64. How can I achieve that?

I have tried this:

data = pd.read_csv("file.csv",  parse_dates=['Date'], dtype=[np.datetime64, np.float64, np.float64, np.float64, np.float64, np.int64])

but I get

TypeError: data type not understood

CodePudding user response:

Use thousands parameter.

df = pd.read_csv("file.csv",  parse_dates=['Date'], thousands=',')

CodePudding user response:

Use converters parameter if you have special format.

converters = {
    'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
    'Number': lambda x: float(x.replace(',', ''))
}
df = pd.read_csv('data.csv', converters=converters)

Output:

>>> df
        Date   Number
0 2021-12-30  2345.55

>>> df.dtypes
Date      datetime64[ns]
Number           float64
dtype: object

# data.csv
Date,Number
"Dec 30, 2021","2,345.55"

Else use standard parameters:

df = pd.read_csv("data.csv",  header=None, parse_dates=[0], thousands=',', quoting=1)

Output:

>>> df
           0        1        2         3        4
0 2021-12-30  1234.11  1654.22  11876.23  1676234

>>> df.dtypes
0    datetime64[ns]
1           float64
2           float64
3           float64
4             int64
dtype: object
  • Related