Home > Back-end >  csv file importing numeric columns with quotes/ as strings
csv file importing numeric columns with quotes/ as strings

Time:01-02

I'm having this issue with some csv files where I have both numeric and non numeric columns.

Read.csv import everything as strings as far as I see because the numbers are being single quoted and the numeric columns appear like this '149.0' or '149,0'. I would like to strip that quote in this case to be able to transform later.

When I have numbers like a million o so,they appear like this: 1.000,000

So the system understand that he needs to quote or otherwise it would be another field(because the second comma is not a point) and I get these messages:

-Error tokenizing data. C error: Expected 1 fields in line 129, saw 2

-could not convert string to float: '1.103.700'

How can I make Python understand or strip/change this behaviour so that numeric columns are imported already ok?

I tried different approaches such as quoting=2 (NON NUMERIC) , astype(float), pd.replace..... Nothing works.

I don't know if I'm reading the files with the wrong command or what.

Could you help me please? For example one column with this issue is ccaavacunas.iloc[:,[3]]

The file is here: enter image description here

CodePudding user response:

If you apply a converter function the data can be converted to the proper type. See here for more details: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html?highlight=read_csv

import pandas as pd


def converter_function(value_to_convert):
    # Replace "," with "." and assign to a new variable
    converted_value = value_to_convert.replace(",", ".")

    # Check if there is more than one occurrence of "."
    if converted_value.count(".") > 1:
        converted_value = converted_value.replace(".", "")

    # Convert to float type if value allows, if not return the original value
    converted_value = float(converted_value) if converted_value.replace('.', '', 1).isdigit() else value_to_convert

    return converted_value


ccaavacunas = pd.read_csv("ccaa_vacunas.csv", keep_default_na=True, delimiter=',', decimal='.', quoting=1,
                          converters={
                              'Dosis entregadas Pfizer': converter_function,
                              'Dosis entregadas Moderna': converter_function,
                              'Dosis entregadas AstraZeneca': converter_function,
                              'Dosis entregadas Janssen': converter_function,
                              'Dosis entregadas totales': converter_function,
                              'Dosis administradas': converter_function,
                              'Porcentaje de dosis administradas por 100 habitantes': converter_function,
                              'Porcentaje sobre entregadas': converter_function,
                              'Personas con pauta completa': converter_function,
                              'Porcentaje con pauta completa': converter_function,
                          })
  • Related