Home > Software engineering >  Grouping results by a set of dates in Redshift with two tables
Grouping results by a set of dates in Redshift with two tables

Time:08-24

Hope you are fine, I am trying to account the amount of observations that I have in an employee database. Tables look more or less like this:

Date_Table

date_dt
2020-09-07
2020-09-14
2020-09-21

Employee_table

login_id effective_date is_active
a 2020-09-07 1
a 2020-09-14 1
b 2020-09-07 1
b 2020-09-14 0
c 2020-09-21 1

keep in mind the effective_date represents (the higher the date the most recent the change) some change (attrition, position change, what ever, those are easily filtered) being the latest the one the current status.

In the above example the date 2020-09-14 for empl_login b would be the day it stopped to be active within the table.

I want to reflect something like this:

the_date amount_of_employees
2020-09-07 2
2020-09-14 1
2020-09-21 2

This query works perfectly fine, and provides me the correct number:

SELECT '2020-09-07',COUNT(DISTINCT login_id) amount_of_employees 
FROM (SELECT date_dt FROM Date_Table) AS dd,(SELECT *,
             ROW_NUMBER() OVER (PARTITION BY login_id ORDER BY effective_date DESC) AS chk
      FROM Employee_table WHERE effective_date <=  '2020-09-07' ) AS dp

WHERE 
dp.is_active =1
AND 
dp.chk=1
GROUP BY 1 
ORDER BY 1 ASC;

Great! This one works and gives me the right value:

the_date amount_of_employees
2020-09-07 2

However, when I try this to build my dataset with this query:

SELECT dd.date_dt ,COUNT(DISTINCT login_id) amount_of_employees 
FROM (SELECT date_dt FROM Date_Table) AS dd,(SELECT *,
             ROW_NUMBER() OVER (PARTITION BY login_id ORDER BY effective_date DESC) AS chk
      FROM Employee_table  WHERE effective_date <=  dd.date_dt ) AS dp

WHERE 
dp.is_active =1
AND 
dp.chk=1
GROUP BY 1 
ORDER BY 1 ASC;

I get this error message:

Invalid operation: subquery in FROM may not refer to other relations of same query level

I tried to investigate something like this:

https://w3coded.com/questions/672056/error-subquery-in-from-cannot-refer-to-other-relations-of-same-query-level

but didn't work or doesn't apply necessarily. May be I am not getting it

Any idea? I wouldn't like to make A lot of unions, but is a workaround.

Thanks in advance

CodePudding user response:

I'm not familiar with Amazon Redshift,but as long as your query syntax is supported, you can use a subquery to get the count, and there you'll be able to refer to the columns of the outer query like this

SELECT
   dt.date_dt,
   (
       SELECT COUNT(DISTINCT login_id) 
       FROM (
           SELECT 
               *,
               ROW_NUMBER() OVER (PARTITION BY login_id ORDER BY effective_date DESC) AS rn
          FROM employee_table et
          WHERE et.effective_date <= dt.date_dt
          ORDER BY effective_date DESC
       ) t
       WHERE rn = 1 AND is_active = 1
   ) amount
FROM date_table dt   

CodePudding user response:

this is a solution for this:

SELECT dt.date_dt, COUNT(DISTINCT login_id) other_account
FROM Date_Table dt
         LEFT JOIN employee_table et ON dd.date_dt BETWEEN et.effective_date  AND et.effective_date    (some additional interval)
WHERE et.is_active = 1 (And other where clauses)
GROUP BY 1

Thanks for all your support

  • Related