Understanding tuple syntax in SQL-CodePudding

I've always used the IN (val1, val2, ...) syntax quite easily when testing for a bunch of values. However, I'm wondering what type of data structure it actually evaluates to, is this a table function? For example:

-- to populate data
CREATE TABLE main_territory (
  name varchar NOT NULL,
  is_fake_territory integer NOT NULL,
  code varchar NOT NULL
);

INSERT INTO main_territory (name, is_fake_territory, code) VALUES ('Afghanistan', 0, 'AF'), ('Albania', 0, 'AL'), ('Algeria', 0, 'DZ');

select '1' as "query#", * from main_territory where code in ('AF', 'AL') union all
select '2' as "query#", * from main_territory where code in (select 'AF' UNION ALL select 'AL') UNION ALL
select '3' as "query#", * from main_territory where code in (select code from main_territory where name ='Albania' or name = 'Afghanistan')

The second and third queries return a one-columned table (is this called a scalar-table?), and so I would imagine doing (expr1, expr2, ...) does the same thing -- it evaluates to a one-columed table. Is that accurate, or what actual data type is this?

CodePudding user response：

The confusion is understandable, since these are actually two different kinds of IN:

WHERE expr IN (2, 3, 4, ...)
WHERE expr IN (SELECT ...)

The first will be converted to an array like this:

                    QUERY PLAN                  
═══════════════════════════════════════════════════
 Seq Scan on tab
   Filter: (expr = ANY ('{2,3,4,...}'::integer[]))

or, if the list has only one element, to

      QUERY PLAN     
══════════════════════
 Seq Scan on tab
   Filter: (expr = 2)

The second will be executed as a join, for example:

            QUERY PLAN             
═══════════════════════════════════
 Hash Join
   Hash Cond: (tab.expr = sub.col)
   ->  Seq Scan on tab
   ->  Hash
         ->  Seq Scan on sub

So, to answer your question: A plain IN list will be converted to an array, and IN becomes = ANY.

CodePudding user response：

I would not call the IN ( ) predicate tuple comparison. An example of tuple comparison (aka row constructor comparison) is:

WHERE (col1, col2) = ('abc', 123)

Or you can even do multivalued row constructor comparison:

WHERE (col1, col2) IN (('abc', 123), ('xyz', 456))

The examples you show are simply the IN ( ) predicate, which compares a single value to a list of values. If the value matches any of those in the list, the predicate is satisfied. The list can either be a fixed list of expressions or literals:

WHERE code IN ('AF', 'AL')

Or it can be the result of a subquery:

WHERE code IN (SELECT code FROM ...)

How this is implemented depends on the code of the respective RDBMS. It might have to materialize the result of the subquery and store it as a list internally. In some software, they may use a temporary table with one column as the data structure to store the result of the subquery. Then the IN ( ) predicate can be executed as a join against that temporary table. If there's one thing an SQL engine ought to be able to do efficiently, it's a join. :-)

But this might be expensive if the result of the subquery is millions of rows. In that case, a clever optimizer would "factor out" the IN ( ) predicate and just do a join. That is, it would read each value of code and do an index lookup into the second table's code column. This means there's no data structure per se, it's just the evaluation of a join.

The real answer would be implementation-dependent. Both MySQL and PostgreSQL are open-source, so you can try downloading and reading the code yourself if you want to know the implementation.