Home > front end >  PostgreSQL WHERE Clause what is included/excluded?
PostgreSQL WHERE Clause what is included/excluded?

Time:03-29

I have come across a postgresql query with a lot of WHERE statements like this

WHERE (category <> 'A' OR category IS NULL)
AND (category <> 'B' OR category IS NULL) 

I am struggling to understand what data this query is including/excluding.

I tried rewriting the code above as

  1. WHERE category NOT IN ('A','B')

  2. WHERE category NOT IN ('A', 'B') OR category IS NULL

  3. WHERE (category NOT IN ('A', 'B') OR category IS NULL)

And all three gave different answers to the original code.

Could someone explain to me what data in included/excluded in each of the four cases above?

Say for example the data looked like

ID Category
1 A
2 B
3 C
4 D
5 NULL

For (1) I would just get ID's 3, 4. But I am unsure about the others.

EDIT: WHERE (category NOT IN ('A', 'B') OR category IS NULL) and

WHERE (category <> 'A' OR category IS NULL)
AND (category <> 'B' OR category IS NULL)

Give the same answer.

But WHERE category NOT IN ('A', 'B') OR category IS NULL without parenthesis gives a different answer.

CodePudding user response:

To correctly understand the output of the mentioned queries you have to think in the following way: take one by one all the lines that satisfy the WHERE clause.

Query 1

  1. WHERE category NOT IN ('A','B')

The query 1 should give all the lines with the attribute category not in the set specified. If you proceed step by step, one line at a time you can see that:

  • the first 2 lines are not included in the output since the category column contains values in the set ('A','B')
  • the next 2 lines are included in the output since the category column doesn't contain values in the set ('A','B')
  • the last line is not included in the output since the NULL values are evaluated as UNKNOWN according to the Three-Valued Logic

To better understand the last point the clause WHERE category NOT IN ('A','B') can be rewritten as WHERE category<>'A' AND category<>'B'. Since category is NULL the logical expression is evaluated in the following way WHERE NULL<>'A' AND NULL<>'B', which output is UNKNOWN, so the line will not be included in the output result.

Queries 2 & 3

  1. WHERE category NOT IN ('A', 'B') OR category IS NULL

Queries 2 and 3 are the same, since parentheses in this case doesn't affect the evaluation order of the logical operators.

In this particular case the last line of the example table above is included in the output since category NOT IN ('A','B') is evaluated as UNKNOWN and category IS NULL is evaluated to TRUE. For the same reason mentioned above (Three-Valued Logic) the result of UNKNOWN or TRUE is TRUE.

CodePudding user response:

I think that you are struggling in the logic of the query, the first one is easy to understand

where category not in ('A','B') 

you will get id : 3,4,5

Note that it is better if you use the ids of the category than the letters.

in the second query

where category not in ('A','B') or category is null

both the condition will be true and both will be done

you will get ids of : 3,4

and the third one it has to give you the same output of the (2) condition

  • Related