Home > front end >  What is the value being returned using "" in SQL WHERE filter?
What is the value being returned using "" in SQL WHERE filter?

Time:07-16

In a dataset I am cleaning, according to the schema, there should only be two distinct values under the "usertype" column. Upon further analysis I discovered there is an empty third value accounting for 5828994 empty rows of the total dataset.

I tested to see if the third value would return NULL and it did not. As well as counted for Null and it returned a count of "0".

SELECT SUM(CASE WHEN usertype is null THEN 1 ELSE 0 END) 
AS number_of_nulll
    , COUNT(usertype) number_of_non_null
    FROM dataset

I filtered to see if it would return an empty value but the results were - "There is no data to display"

WHERE usertype = " "

By chance I filtered WHERE usertype = "" and it returned the 5828994 rows as empty rows I was looking to isolate.

WHERE usertype = ""

My question is, what value did the "" filter return?

CodePudding user response:

WHERE usertype = " "

Selects where the usertype is a single space - you get no results

WHERE usertype = ""

Selects where the usertype is blank (this is not the same as NULL). - This is where you get results.

Therefore, your table has strings that are blank, but they are not considered NULL, which of course could be confusing.

If you're manually loading this data yourself, I would check the google CLI for the parameter --null_marker, which should give you some options on how to handle this upon ingestion.

If you are stuck with it the way it is, you can get in the habit of using NULLIF() which can search for a condition and return NULL.

For example,

SELECT 
 SUM(CASE WHEN usertype is null THEN 1 ELSE 0 END) AS number_of_null,
 SUM(CASE WHEN NULLIF(usertype,'') is null THEN 1 ELSE 0 END) AS number_of_null_or_blank,
 COUNT(usertype) number_of_non_null
FROM dataset
  • Related