Home > Net >  Priority order and functioning of the WHERE command
Priority order and functioning of the WHERE command

Time:01-28

I am have a problem understanding the logic of a SELECT example I have seen lately.

There is two tables: a protein table containing a protein name variable and a interaction table containing the interaction type and the ids of the two proteins involved in that interaction.

interaction table
   interactionType 
   protID1
   protID2
protein table
   protID
   protName

The SELECT example is supposed to permit to obtain the names of all the proteins that have a "coiled-coil" interactionType. Here is the example:

SELECT a.protName
FROM protein a,
     interaction b
WHERE (a.protID = b.protID1 AND a.protID = b.protID2) OR
      (a.protID = b.protID2 AND a.protID = b.protID1) AND b.interactionType = ‘coiled-coil’;

What is the necessity for having this command "(a.protID = b.protID1 AND a.protID = b.protID2)" twice? Would not one instance suffice to obtain all the wanted proteins name?

CodePudding user response:

No as the second conditio is already met by the first

so correctly you could write it more clearly

SELECT a.protName FROM protein a, interaction b WHERE
(a.protID = b.protID1 AND a.protID = b.protID2) OR 
((a.protID = b.protID2 AND a.protID = b.protID1) AND
b.interactionType = ‘coiled-coil’);

so the query would return all protein that fit (a.protID = b.protID1 AND a.protID = b.protID2) that both proteins are the same, and not oly the ones with b.interactionType = ‘coiled-coil’,see the example.

For that you would delete the first condition completely.

On another note JOIN is around for 30 years and is sql standard, so you should switch to it always

CREATE TABLE interaction(
   interactionType varchar(20),
   protID1 int, 
   protID2 int);
CREATE tABLE protein (
   protID int,
   protName varchar(25));
INSERT INTO interaction VALUES ('coiled-coil', 1,1),('B', 3,3)
Records: 2  Duplicates: 0  Warnings: 0
INSERt INTO protein VALUES (1,'A1'),(2,'B1'),(3,'C1')
Records: 3  Duplicates: 0  Warnings: 0
SELECT a.protName FROM protein a, interaction b WHERE
(a.protID = b.protID1 AND a.protID = b.protID2) OR
(a.protID = b.protID2 AND a.protID = b.protID1) AND
b.interactionType = 'coiled-coil';
protName
A1
C1

fiddle

CodePudding user response:

There are many things that are funky about that query.

(a.protID = b.protID1 AND a.protID = b.protID2) and (a.protID = b.protID2 AND a.protID = b.protID1) are indeed the same.


If we name the duplicated condition (a.protID = b.protID1 AND a.protID = b.protID2) / (a.protID = b.protID2 AND a.protID = b.protID1) : Condition1 and the last condition b.interactionType = 'coiled-coil' : Condition2, we can rephrase, for readability sake :

Condition1 OR Condition1 AND Condition2

Which is equivalent to Condition1 OR (Condition1 AND Condition2), since AND takes precedence over OR

If Condition1 is false, then Condition1 AND Condition2 will be evaluated (and will result as false, since Condition1 is false

If Condition1 is true, then Condition1 AND Condition2 won't be evaluated

Condition1 OR Condition1 AND Condition2 is then equivalent to Condition1, which means that

(a.protID = b.protID1 AND a.protID = b.protID2) OR
(a.protID = b.protID2 AND a.protID = b.protID1) AND
b.interactionType = 'coiled-coil'

Is equivalent to

a.protID = b.protID1 AND a.protID = b.protID2

The way the join is written SELECT ... FROM table1, table2 WHERE table1.id = table2.table1id is deprecated since 1992

Prefer using "proper" joins : SELECT ... FROM table1 INNER JOIN table2 ON table1.id = table2.table1id


I would rephrase the query such as :

SELECT a.protName
FROM protein a
INNER JOIN interaction b
    ON a.protID = b.protID1 AND a.protID = b.protID2
WHERE b.interactionType = 'coiled-coil';

CodePudding user response:

I'm not sure I'm understanding the problem correctly. It seems there are two relationships between those two tables. Namely:

  • interaction.prodID1 -> protein.protID
  • interaction.prodID2 -> protein.protID

If this is the case, then the query should include two joins -- one for each relationship. It should take the form:

select
  a.protID as protein1,
  b.protID as protein2
from interaction i
join protein a on a.protID = i.protID1
join protein b on b.protID = i.protID2
where i.interactionType = 'coiled-cloi'

CodePudding user response:

SELECT a.protName
FROM protein a,
     interaction b
WHERE (a.protID = b.protID1 AND a.protID = b.protID2) OR
      (a.protID = b.protID2 AND a.protID = b.protID1) AND b.interactionType = ‘coiled-coil’;

The second WHERE clause row is already covered by the first row.

I.e. you can simply do:

SELECT a.protName
FROM protein a,
     interaction b
WHERE a.protID = b.protID1 AND a.protID = b.protID2;

Which re-written using modern, explicit JOIN syntax, and also having table aliases that make sense, will be:

SELECT a.protName
FROM protein p
JOIN interaction i
  ON p.protID = i.protID1 AND p.protID = i.protID2;

https://dbfiddle.uk/yu2Wt3gA

However, this is just the given query written in a less complex way. I'd guess @The Impaler has given the correct query.

CodePudding user response:

We cannot know what the author of the query intended. We see they are using

WHERE (a.protID = b.protID1 AND a.protID = b.protID2) OR
      (a.protID = b.protID2 AND a.protID = b.protID1)

But this is merely one condition stated twice (namely that a.protID matches both b.protID1 and b.protID2.

This would hence only get rows where b.protID1 = b.protID2. I find it unlikely that this was intended, for in that case the author could have made this condition explicit. The interaction table refers to two proteins and the OR makes me think the query was supposed to either return all proteins that are involved in a coiled-coil interaction or only those where the interaction involved another protein, too. I.e.:

WHERE (a.protID = b.protID1 OR a.protID = b.protID2)
AND b.interactionType = 'coiled-coil'

or

WHERE
(
  (a.protID = b.protID1 AND a.protID <> b.protID2)
   OR
  (a.protID = b.protID2 AND a.protID <> b.protID1)
)
AND b.interactionType = 'coiled-coil'

Both conditions could easily lead to duplicates in the result. I am just guessing here of course, but the author may have intended this:

SELECT protname
FROM protein p
WHERE EXISTS
(
  SELECT null
  FROM interaction i
  WHERE p.protid in (i.protid1, i.protid2)
  AND i.interactionType = 'coiled-coil'
);

or this:

SELECT protname
FROM protein p
WHERE EXISTS
(
  SELECT null
  FROM interaction i
  WHERE p.protid in (i.protid1, i.protid2)
  AND i.protid1 <> i.protid2
  AND i.interactionType = 'coiled-coil'
);
  •  Tags:  
  • sql
  • Related