I am working on an e-shop with three tables: products
, atributes
and product_atributes
. I have already filled tables products
and atributes
:
insert into atribute(name)
select random_atribute()
from generate_series(1,10000) as seq(i);
insert into product(price)
select floor(1 random() * 100)::int
from generate_series(1,20000) as seq(i);
I want to generate random data for the table product_atributes
: There can be 1 attribute for 1 product but also all attributes for just 1 product. The result should look like:
product_atributes
id product_id atribute_id
1 5 12
2 5 76
3 5 10
4 5 7
5 1 45
6 1 109
...
Table product_atributes
is created like this:
create table product_atributes(
id serial primary key,
atribute_id integer references atribute(id),
product_id integer references product(id)
);
I tried sth. like this, but it's not working:
with data as (
select s.i,
random_atribute_id() as atribute_id,
s.id as product_id
from (generate_series(1, 1000) as seq(i)
cross join lateral (select seq.i, * from product order by random() limit 1) as s)
)
insert into product_atributes(atribute_id, product_id)
select atribute_id, product_id from data;
How can I do this?
CodePudding user response:
Simple
To insert every possible combination with an even chance of 1 percent:
INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM product p
JOIN atribute a ON random() >= 0.99;
More sophisitcated & faster
This offers more control over minimum and maximum number of attributes.
And it's a lot faster for your particular setup with gap-less attribute IDs (from 1 to 10000):
INSERT INTO product_atributes(product_id, atribut_id)
SELECT p.product_id, a.atribut_id
FROM (SELECT product_id, trunc(random() * 10)::int AS ct FROM product) p
CROSS JOIN LATERAL (
SELECT DISTINCT 1 trunc(random() * 10000) AS atribut_id
FROM generate_series (0, p.ct) g
) a;
This generates 0 to 10 completely random associations per product.
2 trunc(random() * 6)::int AS ct
would generate 3 to 8 associations. (Note the 0-based generate_series()
!)
Generate the number of attributes (ct
) in a subquery to avoid single evaluation of random()
resulting in the same number for all products.
We can optimize performance for gap-less IDs quite a bit. Instead of walking through all 10000 attribute for every product, just generate random numbers for the given range of IDs. Much faster.
Throw in DISTINCT
to eliminate (unlikely) duplicates. Hardly matters for 10 our of 10000, but we can't allow duplicates. (So it's possible to get fewer associations in rare cases.)
We could even work with a few gaps. There are quite a few subtleties to selecting random rows. See:
Asides
You could also populate all three tables with random data in a single query using data-modifying CTEs. Simple example for a 1:n relationship:
Same principle for your case, jut two CTEs ...
Related:
"Attributes" are spelled like this.