Home > database >  Value error when reading data from Google Sheets with comma as decimal
Value error when reading data from Google Sheets with comma as decimal

Time:12-27

When reading from Google Sheet with Python and the Google API I have a hard time getting numeric values from the numeric values.

Where i live we use comma as decimal separator, which seems to be the problem.

The Google sheet input is:

enter image description here

with automatic format.

The Python script is:

gc = gspread.service_account(filename="gdocs_access.json")
sh = gc.open("my_file").worksheet("sheet 1")
data = sh.get_all_records(numericise_ignore=["all"])
print(data)

[{'id': 'b', 'value': '200,3'}, {'id': 'c', 'value': '40'}, {'id': 'a', 'value': '-30,5'}]

returning all numeric values as string.

If I put this in a df and convert the string values to int:

data = pd.DataFrame(data)
data['value'].astype(int)

I get this error:

ValueError: invalid literal for int() with base 10: '200,3'

Apparently this error is caused when passing a string representation of a float into int, but that does not help me much. It does not help me to first convert to float.

Is there a solution for this? I am sure I am not the first one using comma as decimal and trying to read Google sheets with Python.

CodePudding user response:

What you need to do is to change , to . in all of the strings. This can be done using the str.replace-method.

Either you can make a new list data using list comprehension and convert that to a DataFrame:

data = [{'id': e['id'], 'value': int(float(e['value'].replace(',', '.')))} for e in data]
data = pd.DataFrame(data)

or you can convert directly to a DataFrame and change the column.

data = pd.DataFrame(data)
data['value'] = data['value'].str.replace(',', '.').astype(float).astype(int)

CodePudding user response:

In your situation, when the values in the cells are the number values, how about using value_render_option as follows?

From:

data = sh.get_all_records(numericise_ignore=["all"])

To:

data = sh.get_all_records(numericise_ignore=["all"], value_render_option="UNFORMATTED_VALUE")

or

data = sh.get_all_records(value_render_option="UNFORMATTED_VALUE")

Reference:

  • Related