Home > Software engineering >  CAST a column contain empty cells, NULL and String to FLOAT64 in BigQuery
CAST a column contain empty cells, NULL and String to FLOAT64 in BigQuery

Time:09-23

I am encountering a problem with casting strings to float that I haven't been able to solve. I have a column in a BQ table contain values (numbers as string), NULLs adn empty cells. Now, I want to CAST the values to FLOAT64 but I get a double value error. I tried:

CAST(myfield AS FLOAT64) as myfield

and thinking it might help to replace empty field by NULL:

CAST(REGEXP_REPLACE(myfield, " ", NULL) AS FLOAT64 ) as myfield

But this sets everything to NULL.

Any idea on how to solve this would be greatly appreciated.

CodePudding user response:

Based on a comment you made to another question, it is clear that spaces are not your only problem in your data.

You need to identify and fix the problem data (which should ideally be done before ingestion in to the database, but that's a different question).

For example, you could use safe_cast():

SAFE_CAST() doesn't throw an error, instead it returns NULL, so you can just check for those NULLs...

SELECT
  *
FROM
  yourTable
WHERE
      myfield IS NOT NULL
  AND SAFE_CAST(REPLACE(myfield, ' ', '') AS FLOAT64) IS NULL

Or, just use the SAFE_CAST() in your query (instead of CAST()) and accept that badly formed data will yield a NULL

  • That prevents runtime errors
  • But 'hides' bad data as NULLs

EDIT:

Also, you can make your REPLACE() more robust by stripping other white space characters too (such as tabs, carriage returns, etc).

CodePudding user response:

To remove characters from a string, use an empty string, not NULL:

CAST(REGEXP_REPLACE(myfield, ' ', '') AS FLOAT64 ) as myfield

REGEXP_REPLACE() is overkill for this operation. You can just use regular REPLACE():

CAST(REPLACE(myfield, ' ', '') AS FLOAT64 ) as myfield
  • Related