Home > database >  Get count of values in different subgroups
Get count of values in different subgroups

Time:10-31

I need to delete some rows in the dataset, of which the speed equals zero and lasting over N times (let's assume N is 2). The structure of the table demo looks like:

id car speed time
1 foo 0 1
2 foo 0 2
3 foo 0 3
4 foo 1 4
5 foo 1 5
6 foo 0 6
7 bar 0 1
8 bar 0 2
9 bar 5 3
10 bar 5 4
11 bar 5 5
12 bar 5 6

Then I hope to generate a table like the one below by using window_function:

id car speed time lasting
1 foo 0 1 3
2 foo 0 2 3
3 foo 0 3 3
4 foo 1 4 2
5 foo 1 5 2
6 foo 0 6 1
7 bar 0 1 2
8 bar 0 2 2
9 bar 5 3 4
10 bar 5 4 4
11 bar 5 5 4
12 bar 5 6 4

Then I can easily exclude those rows by using WHERE NOT (speed = 0 AND lasting > 2)

Put the code I tried here, but it didn't return the value I expected and I guess those FROM (SELECT ... FROM (SELECT ... might not be the best practice to solve the problem:

SELECT g3.*, count(id) OVER (PARTITION BY car, cumsum ORDER BY id) as num   
  FROM (SELECT g2.*, sum(grp2) OVER (PARTITION BY car ORDER BY id) AS cumsum             
    FROM (SELECT g1.*, (CASE ne0 WHEN 0 THEN 0 ELSE 1 END) AS grp2                            
      FROM (SELECT g.*, speed - lag(speed, 1, 0) OVER (PARTITION BY car) AS ne0              
        FROM (SELECT *, row_number() OVER (PARTITION BY car) AS grp FROM demo) g ) g1 ) g2 ) g3                                                                                       
ORDER BY id;

CodePudding user response:

You can use window function LAG() to check for the previous speed value for each row and SUM() window function to create the groups for the continuous values.
Then with COUNT() window function you can count the number of rows in each group so that you can filter out the rows with 0 speed in the groups that have more than 2 rows:

SELECT id, car, speed, time
FROM (
  SELECT *, COUNT(*) OVER (PARTITION BY car, grp) counter
  FROM (
    SELECT *, SUM(flag::int) OVER (PARTITION BY car ORDER BY time) grp
    FROM (
      SELECT *, speed <> LAG(speed, 1, speed - 1) OVER (PARTITION BY car ORDER BY time) flag
      FROM demo
    ) t  
  ) t
) t
WHERE speed <> 0 OR counter <= 2
ORDER BY id;

See the demo.

  • Related