I have data that contains everyday data for each id. I have unique 96 ids. So for each id there are 365 days recorded like
date ids data
01.01.2022 1 0
02.01.2022 1 0
03.01.2022 1 0
04.01.2022 1 0
05.01.2022 1 0
06.01.2022 1 321
07.01.2022 1 0
08.01.2022 1 0
09.01.2022 1 0
10.01.2022 1 0
11.01.2022 1 0
12.01.2022 1 0
13.01.2022 1 0
14.01.2022 1 0
15.01.2022 1 0
16.01.2022 1 0
17.01.2022 1 0
18.01.2022 1 0
19.01.2022 1 0
20.01.2022 1 0
21.01.2022 1 0
01.01.2022 2 434
02.01.2022 2 0
03.01.2022 2 0
04.01.2022 2 0
05.01.2022 2 0
06.01.2022 2 0
07.01.2022 2 0
…
01.05.2022 1 3213
02.05.2022 1 0
03.05.2022 1 0
04.05.2022 1 0
05.05.2022 1 0
06.05.2022 1 0
07.05.2022 1 0
08.05.2022 1 0
09.05.2022 1 0
So I need to select the first 7 days from the point where the data
column is not 0 or null. From this example, we can see that I should get all rows between 06.01.2022 and 12.01.2022 for id - 1, and between 01.01.2022 and 07.01.2022 for id - 2, and so on. In addition, the id can be repeated again on other dates. For example, for id there is new data on 01.05.2022, so we take this data too with the next 7 consecutive days. How select query will look for this question in PostgreSQL? I hope I could deliver the idea
CodePudding user response:
You can get all non-zero and non-null rows in a with
CTE and join that with your table, adding anything up to 6 days from that - assuming you're after first 7 days including the one with non-empty data.
with first_non_zeros as (
select ids,date
from test
where data<>0 --and data is not null --the "not null" isn't necessary
)
select t.*
from test t
inner join first_non_zeros f
on t.ids=f.ids and t.date-f.date between 0 and 6
order by ids,date;
date | ids | data
----------- ----- ------
2022-01-06 | 1 | 321
2022-01-07 | 1 | 0
2022-01-08 | 1 | 0
2022-01-09 | 1 | 0
2022-01-10 | 1 | 0
2022-01-11 | 1 | 0
2022-01-12 | 1 | 0
2022-05-01 | 1 | 3213
2022-05-02 | 1 | 0
2022-05-03 | 1 | 0
2022-05-04 | 1 | 0
2022-05-05 | 1 | 0
2022-05-06 | 1 | 0
2022-05-07 | 1 | 0
2022-01-01 | 2 | 434
2022-01-02 | 2 | 0
2022-01-03 | 2 | 0
2022-01-04 | 2 | 0
2022-01-05 | 2 | 0
2022-01-06 | 2 | 0
2022-01-07 | 2 | 0
CodePudding user response:
Try the following:
/* create groups using the running sum function of the data column, whenever a data value is present (<>0 and not null) a new group is created.*/
with create_groups as
(
select *,
sum(data) over (partition by ids order by date) grp
from table_name
),
/* find the count and row numbers for each defined group within each id */
get_group_counts as
(
select *,
count(*) over (partition by ids, grp) cnt,
row_number() over (partition by ids, grp order by date) rn
from create_groups
)
select date, ids, data
from get_group_counts
where cnt >= 7 and rn <= 7 /* ensures that the group has at least 7 rows, and returns only the first 7 rows (ordered by date) */
order by ids, date