Home > Software engineering >  Identifying first unique value in PostgreSQL
Identifying first unique value in PostgreSQL

Time:12-17

I have a table crashes with around one million rows, each row containing data on either:

  • every car crash that did not occur near a school, or

  • every car crash that occurred near a school, with additional rows per crash if it occurred near more than one school (e.g. 4 rows for a crash near 4 schools). The highest number of rows/nearby schools for one crash is 10.

I'd like to add a column to the table returning a "1" for only one appearance of every crash_id that appears in more than one row, and a "0" for any subsequent occurrences of that same crash_id in the column crash_id. Which row has a 1 or 0 per crash_id doesn't matter.

I've tried all of the suggestions offered in response to this similar question, but I couldn't get any of them to work for me.

FWIW, I got this to work in Excel using this formula:

=(COUNTIF($C$2:$C2,$C2)=1) 0

But that was for a small table, not one with one million rows.

What I've tried so far:

SELECT * 
FROM 
( 
    SELECT * , ROW_NUMBER() OVER(PARTITION BY crash_id) AS row 
    FROM crashes 
) AS A1 
WHERE row <6
SELECT * 
FROM 
(
    SELECT * , ROW_NUMBER() OVER(PARTITION BY crash_id) AS row 
    FROM crashes
) AS A1 
WHERE row = 1 

I understand this isn't optimal database design, but it allows me to get most of what I need, except for what I'm describing above.

CodePudding user response:

Just a simple test to add a first_crash column to crashes.

But it needs something to determine which row was first. Since tables are in essence unsorted sets.
The example uses the ID for that.

create table crashes (
 id serial primary key, 
 crash_id int, 
 school_id int
);

alter table crashes 
  add constraint uniq_school_crash unique (crash_id, school_id);

insert into crashes (crash_id, school_id) values
  (101,11), (101,10), (101,12)
, (102,25), (102,24), (102,23)
, (103,null)

alter table crashes
 add column first_crash int default 0;
update crashes c
set first_crash = 1
where first_crash = 0
  and ( 
       school_id is null
    or not exists (
      select 1
      from crashes c2
      where c2.crash_id = c.crash_id
        and c2.id < c.id
    ));
select * from crashes order by id
id | crash_id | school_id | first_crash
-: | -------: | --------: | ----------:
 1 |      101 |        11 |           1
 2 |      101 |        10 |           0
 3 |      101 |        12 |           0
 4 |      102 |        25 |           1
 5 |      102 |        24 |           0
 6 |      102 |        23 |           0
 7 |      103 |      null |           1

db<>fiddle here

Extra

  1. update by row_number
-- using row_number
update crashes c
set first_crash = q.rn
from ( 
      select id
      , row_number() over (partition by crash_id 
                           order by id asc) as rn
      from crashes
) q
where q.rn = 1 
  and q.id = c.id;
  1. using a temp table
-- using temporary table
create temporary table tmp_crashes (
 id int primary key, 
 crash_id int
);

insert into tmp_crashes (id, crash_id)
select min(id), crash_id
from crashes
group by crash_id
order by min(id);

update crashes t
set first_crash = 1
from tmp_crashes tmp
where tmp.id = t.id;
  • Related