I have a query to show the average length of all the proteins which also exists in a second table mrna_pieces
:
SELECT AVG(protein_length)
FROM protein
WHERE exists
(SELECT protein_id FROM mrna_pieces WHERE mrna_brokstukken.protein_id = protein.protein_id)
I also want to show the average for all the proteins which don't exist in the second table.
SELECT AVG(protein_length)
FROM protein
WHERE exists
(SELECT protein_id FROM mrna_pieces WHERE mrna_brokstukken.protein_id != protein.protein_id)
But I want these 2 parts in one table like this
table example
I tried:
SELECT AVG(eiwit_lengte) AS avglengthwith, AVG(eiwit_lengte) AS avglengthwithout
FROM eiwit
WHERE exists
(SELECT eiwit_id FROM mrna_brokstukken WHERE mrna_brokstukken.eiwit_id != eiwit.eiwit_id)
WHERE exists
(SELECT eiwit_id FROM mrna_brokstukken WHERE mrna_brokstukken.eiwit_id = eiwit.eiwit_id)
But that gives me the following error:
ERROR: pq: syntax error at or near "WHERE"
CodePudding user response:
Just select the two at the same time:
select (
select AVG(protein_length)
FROM protein
WHERE exists
(select protein_id from mrna_pieces where mrna_brokstukken.protein_id = protein.protein_id)
) as avgwith,
(
select AVG(protein_length)
FROM protein
WHERE exists
(select protein_id from mrna_pieces where mrna_brokstukken.protein_id != protein.protein_id)
)
as avgwithout
CodePudding user response:
as both queries will gove back a scalar value you can simply use bot as subqueries
SELECT
(select AVG(protein_length)
FROM protein
WHERE exists
(select protein_id from mrna_pieces where mrna_brokstukken.protein_id = protein.protein_id) ) AVG_1,
(select AVG(protein_length)
FROM protein
WHERE exists
(select protein_id from mrna_pieces where mrna_brokstukken.protein_id != protein.protein_id)) AVG_2
CodePudding user response:
select AVG(CASE WHEN m.protein_id IS NULL THEN p.protein_length END) AS ExistsInTable,
AVG(CASE WHEN m.protein_id IS NOT NULL THEN p.protein_length END) AS ExistsNOTInTable
FROM protein p
LEFT JOIN mrna_pieces m ON m.protein_id = p.protein_id
CodePudding user response:
It should be cheapest to compute the "exists" value once per row in a subquery, then two conditional aggregates in the outer SELECT
:
SELECT avg(protein_length) FILTER (WHERE p_exists) AS avg_len_exists
, avg(protein_length) FILTER (WHERE NOT p_exists) AS avg_len_not_exists
FROM (
SELECT protein_length
, EXISTS (SELECT FROM mrna_pieces m WHERE m.protein_id = p.protein_id) AS p_exists
FROM protein p
) sub;
About the aggregate FILTER
clause (which can be combined with the OVER
clause of window functions):
Or, shorter if two result rows are acceptable:
SELECT EXISTS (SELECT FROM mrna_pieces m WHERE m.protein_id = p.protein_id) AS p_exists
, avg(protein_length) avg_len
FROM protein p
GROUP BY 1;
Should be slightly faster, yet.
Note that the second query in your question is not doing what you are asking for. EXISTS ( ... WHERE m.protein_id != p.protein_id)
is true if at least one row in table mrna_pieces
does not match, which is always the case if there are at least two distinct values in m.protein_id
. You would want NOT EXISTS ( ... WHERE m.protein_id = p.protein_id)
. But use the cheaper query I suggested instead.
CodePudding user response:
You can use the SQL JOIN statement to combine rows from two or more tables based on a related column between them. To get the average value from each table, you can use the AVG() function. Example query:
SELECT AVG(table1.column), AVG(table2.column)
FROM table1
JOIN table2
ON table1.related_column = table2.related_column;