Home > Net >  SELECT random rows with a condition from 2 different table in PostgreSQL
SELECT random rows with a condition from 2 different table in PostgreSQL

Time:12-27

I have 2 different tables; First one is for 'Users' and the second is 'Relations'.

Users has 'username' column and Relations has 'username' and 'friendsname'. For every friendship instance 2 rows are inserted to this table. e.g If John adds Janna as friend this means John is added by Janna likewise.

I am trying to find out something like friend suggestion as in Facebook or Twitter. Random user will be selected from Users table and that will be checked in Relations table whether they are friend or not. And this will be done continously until finding 5 successful non-friend match and return all at once(if it's possible).

I can do this with Select all rows and iterate/search in the server by myself but i don't think this is something PostgreSQL can not handle. Also my way seems too expensive task for this level "relatively" trivial feature.

Is there any easy and elegant way to handle this task?

Thanks in advance and have a good sunday/Christmas night.

UPDATE: I am appending some mock data, sorry for the delay:

create table relations (
    rel_id bigserial primary key not null ,
    username VARCHAR(50) not null ,
    friendname VARCHAR(50) not null ,
    since DATE not null
);
insert into relations (rel_id, username, friendname, since) values (1,'user1', 'user2', '06/01/2021');
insert into relations (rel_id, username, friendname, since) values (2,'user2', 'user1', '06/01/2021');
insert into relations (rel_id, username, friendname, since) values (3,'user1', 'user3', '16/10/2021');
insert into relations (rel_id, username, friendname, since) values (4,'user3', 'user1', '16/10/2021');
insert into relations (rel_id, username, friendname, since) values (5,'user3', 'user5', '16/01/2020');
insert into relations (rel_id, username, friendname, since) values (6,'user5', 'user3', '16/01/2020');

create table Users (
                           user_id bigserial primary key not null ,
                           username VARCHAR(50) not null

);
insert into Users (user_id, username) values (1,'user1');
insert into Users (user_id, username) values (2,'user2');
insert into Users (user_id, username) values (3,'user3');
insert into Users (user_id, username) values (4,'user4');
insert into Users (user_id, username) values (5,'user5');
insert into Users (user_id, username) values (6,'user6');
insert into Users (user_id, username) values (7,'user7');

As expected result query may suggest User4-5-6 and 7 for User1 as friend. Because according to table user1 is friend only with user 2 and user3. Considering he cannot be friend with himself we dont expect to see user1 as in results also.

CodePudding user response:

The next select will return exactly 5 friendsname which are not friends of John (in this case). The second filter must be applied, to not return John for John.

SELECT DISTINCT(friendsname) FROM relations 
WHERE username <> 'John' AND friendsname <> 'John'
LIMIT 5

To add a random factor to this you can modify the select this way:

SELECT * FROM (
SELECT DISTINCT(friendsname) FROM relations 
WHERE username <> 'John' AND friendsname <> 'John'
) as s
ORDER BY random()
LIMIT 5

Notice, that the random() ordering is a heavy operation, because it does a scan.

UPDATE To select from the users table instead of the relations table the query could look like this:

SELECT * FROM users AS u
WHERE u.username NOT IN (SELECT friendname FROM relations WHERE username = 'user3') AND u.username <> 'user3'
ORDER BY random()
LIMIT 3

There are two selects made without join, just filtering, it could possibly perform well. I did not tested it on larger tables.

CodePudding user response:

Without using cursors, one thing you can do is find all possible friendships with a cross join, then remove the existing ones and finally pick 5 random rows of the resulting set, something like this

   select usernm, friend 
from
(
select distinct greatest(a.username,b.username) usernm,least(a.username,b.username) friend 
from users a
cross join users b
where a.username <> b.username
except
select distinct greatest(username, friendname) usernm, least(username, friendname) friend
from relations
) x
order by random() limit 5

the use of greatest and least is to ensure that we don't repeat the same friendship. Also the "where a.username <> b.username" in the cross join is to not introduce "self friendships".

db fiddle: https://www.db-fiddle.com/f/gpYMWHa2dK7BcZpxjHqwGa/0

  • Related