How to generate random data for association table?-CodePudding

I am working on an e-shop with three tables: products, atributes and product_atributes. I have already filled tables products and atributes:

insert into atribute(name)
select random_atribute()
from generate_series(1,10000) as seq(i);

insert into product(price)
select floor(1   random() * 100)::int
from generate_series(1,20000) as seq(i);

I want to generate random data for the table product_atributes: There can be 1 attribute for 1 product but also all attributes for just 1 product. The result should look like:

product_atributes
id  product_id     atribute_id
1      5               12
2      5               76
3      5               10
4      5                7
5      1               45
6      1               109
...

Table product_atributes is created like this:

create table product_atributes(
  id serial primary key,
  atribute_id integer references atribute(id),
  product_id integer references product(id)
);

I tried sth. like this, but it's not working:

with data as (
    select  s.i,
            random_atribute_id() as atribute_id,
            s.id as product_id
   
    from (generate_series(1, 1000) as seq(i)
             cross join lateral (select seq.i, * from product order by random() limit 1) as s)
)

insert into product_atributes(atribute_id, product_id)
select atribute_id, product_id from data;

How can I do this?

CodePudding user response：

Simple

To insert every possible combination with an even chance of 1 percent:

INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM   product p
JOIN   atribute a ON random() >= 0.99;

More sophisitcated & faster

This offers more control over minimum and maximum number of attributes.

And it's a lot faster for your particular setup with gap-less attribute IDs (from 1 to 10000):

INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM  (SELECT product_id, trunc(random() * 10)::int AS ct FROM product) p
CROSS  JOIN LATERAL (
   SELECT DISTINCT 1   trunc(random() * 10000) AS atribut_id
   FROM   generate_series (0, p.ct) g
   ) a;

This generates 0 to 10 completely random associations per product.
2 trunc(random() * 6)::int AS ct would generate 3 to 8 associations. (Note the 0-based generate_series()!)

Generate the number of attributes (ct) in a subquery to avoid single evaluation of random() resulting in the same number for all products.

We can optimize performance for gap-less IDs quite a bit. Instead of walking through all 10000 attribute for every product, just generate random numbers for the given range of IDs. Much faster.
Throw in DISTINCT to eliminate (unlikely) duplicates. Hardly matters for 10 our of 10000, but we can't allow duplicates. (So it's possible to get fewer associations in rare cases.)

We could even work with a few gaps. There are quite a few subtleties to selecting random rows. See:

Best way to select random rows PostgreSQL

Asides

You could also populate all three tables with random data in a single query using data-modifying CTEs. Simple example for a 1:n relationship:

How can I generate fake data that adheres to the PRIMARY KEY-FOREIGN KEY constraints of my schema?

Same principle for your case, jut two CTEs ...

How to implement a many-to-many relationship in PostgreSQL?

"Attributes" are spelled like this.