I'm having this issue with some csv files where I have both numeric and non numeric columns.
Read.csv import everything as strings as far as I see because the numbers are being single quoted and the numeric columns appear like this '149.0' or '149,0'. I would like to strip that quote in this case to be able to transform later.
When I have numbers like a million o so,they appear like this: 1.000,000
So the system understand that he needs to quote or otherwise it would be another field(because the second comma is not a point) and I get these messages:
-Error tokenizing data. C error: Expected 1 fields in line 129, saw 2
-could not convert string to float: '1.103.700'
How can I make Python understand or strip/change this behaviour so that numeric columns are imported already ok?
I tried different approaches such as quoting=2 (NON NUMERIC) , astype(float), pd.replace..... Nothing works.
I don't know if I'm reading the files with the wrong command or what.
Could you help me please? For example one column with this issue is ccaavacunas.iloc[:,[3]]
CodePudding user response:
If you apply a converter function the data can be converted to the proper type. See here for more details: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html?highlight=read_csv
import pandas as pd
def converter_function(value_to_convert):
# Replace "," with "." and assign to a new variable
converted_value = value_to_convert.replace(",", ".")
# Check if there is more than one occurrence of "."
if converted_value.count(".") > 1:
converted_value = converted_value.replace(".", "")
# Convert to float type if value allows, if not return the original value
converted_value = float(converted_value) if converted_value.replace('.', '', 1).isdigit() else value_to_convert
return converted_value
ccaavacunas = pd.read_csv("ccaa_vacunas.csv", keep_default_na=True, delimiter=',', decimal='.', quoting=1,
converters={
'Dosis entregadas Pfizer': converter_function,
'Dosis entregadas Moderna': converter_function,
'Dosis entregadas AstraZeneca': converter_function,
'Dosis entregadas Janssen': converter_function,
'Dosis entregadas totales': converter_function,
'Dosis administradas': converter_function,
'Porcentaje de dosis administradas por 100 habitantes': converter_function,
'Porcentaje sobre entregadas': converter_function,
'Personas con pauta completa': converter_function,
'Porcentaje con pauta completa': converter_function,
})