Home > Back-end >  Pandas dataframe.read_csv with a comma in the column reads incorectly
Pandas dataframe.read_csv with a comma in the column reads incorectly

Time:03-18

I'm trying to read a csv file with the following values [23, " 5,000.00 ", A]

I need to read the number with a comma, but I'm not able to do it even if I pass a decimal and a quotechar.

I'm reading a CSV that df.read_csv(path, delimiter=',', quotechar='"', decimal=',', thousands='.', dtype=str)

I'm getting the following output if I run df.values.tolist() => ['23', ' " 5', '000.00 "', 'A']

What might be the problem?

The column with this type of data is always different. So I cannot transform it based on column name. dtype=str should be left as a string, as it's necessary for the future.

CodePudding user response:

You have to set skipinitialspace's value.

I have the following code:

import pandas as pd

result = pd.read_csv("data.csv", sep=',', quotechar='"', skipinitialspace=True)
print(result.columns)

And my data.csv looks like this: 23; " 5,000.0 "; A

Running the code above give this result: Index([u'23', u' 5,000.0 ', u'A'], dtype='object')

skipinitialspace will remove strip off the extra spaces from " 5,000.0 ". If you try to remove the extra spaces by hand, then you can leave out this parameter and it will work just fine as well.

  • Related