Query optimization on Postgresql-CodePudding

SQL optimization problem, which of the two solutions below is the most efficient?

I have the table from the image, I need to group the data by CPF and date and know if the CPFs had at least one login_ok = true on a specific date. Both solutions below satisfy my need but the goal is to find the best query.

We can have multiple login_ok = true and login_ok = false for CPFs on a specific date. I just need to know if there was at least one login_ok = true

enter image description here

I already have two solutions, I want to discuss how to make another more efficient

CodePudding user response：

Maybe this would work for your problem:

SELECT
    t2.CPF,
    t2.data    
FROM (
    SELECT 
        CPF, 
        date(data) AS data 
    from db_risco.site_rn_login
    WHERE login_ok
    ) t2
GROUP BY 1,2
ORDER BY t2.data

DISTINCT would also work, and I doubt it would pose any performance threat in your case. Usually it evals expressions (like date(data)) before checking for uniqueness.

By using a subquery, in this case, you can select upfront which CPFs to include and then extract date. Finally you'd group by on a quite smaller number os lines, since those were previously selected.

CodePudding user response：

PostgreSQL has the function BOOL_OR to check whether the expression is true for at least one row. It is likely to be optimised for this kind of task.

select cpf, date(data) as data, bool_or(login_ok) as status_login
from db_risco.site_rn_login
group by cpf, date(data);

An index on (cpf, date(data)) or even on (cpf, date(data), login_ok) could help speed up the query.

On a side note: You may also want to order your results with ORDER BY. Don't rely on GROUP BY doing this. The order of the rows resulting from a query is arbitrary without a GROUP BY clause.