I would like to know how experienced sql users would compute a lot of counts with different conditions. I have a table [population] and a table [sql_rules].

My [population] table looks something like this with ~1.7M rows and ~30 columns.

eye_color	hair_color	age	origin	income	...
blue	brown	36	US	40000	...
green	yellow	17	UK	60000	...
brown	black	42	DE	20000	...
black	black	28	DK	80000	...
...	...	...	...	...	...

My [sql_rules] table looks something like this, with ~800 rows and at maximum 8 rules (average ~5) :

row_number	rule_1	rule_2	rule_3	rule_4	...
1	hair_color = 'brown'	age < 27	origin IN ('US', 'UK')	income >= 40000	...
2	hair_color = 'black'	origin IN ('DK', 'FR')	age < 10	income >= 40000	...
3	hair_color = 'yellow'	origin IN ('TH', 'PE')	age > 34
4	hair_color = 'black'	age > 99	origin IN ('US', 'UK')	income >= 40000	...
5	age < 27	income >= 100000
...	...	...	...	...	...

What I need to do is basically to 'iterate' by row and to compute the count when applying the rules one after the other to obtain one 'count' by cell of my sql_rules table. Basically, since it is hard to explain with words, here are the count I would like to obtain :

row_number	rule_1	rule_2	rule_3	rule_4	...
1	SELECT COUNT(*) FROM population WHERE hair_color = 'brown'	SELECT COUNT(*) FROM population WHERE hair_color = 'brown' AND age < 27	SELECT COUNT(*) FROM population WHERE hair_color = 'brown' AND age < 27 AND origin IN ('US', 'UK')	SELECT COUNT(*) FROM population WHERE hair_color = 'brown' AND age < 27 AND origin IN ('US', 'UK') AND income >= 40000	...
2	SELECT COUNT(*) FROM population WHERE hair_color = 'black'	SELECT COUNT(*) FROM population WHERE hair_color = 'black' AND origin IN ('DK', 'FR')	SELECT COUNT(*) FROM population WHERE hair_color = 'black' AND origin IN ('DK', 'FR') AND age < 10	SELECT COUNT(*) FROM population WHERE hair_color = 'black' AND origin IN ('DK', 'FR') AND age < 10 AND income >= 40000	...
3	...	...	...	...	...
5	SELECT COUNT(*) FROM population WHERE age < 27	SELECT COUNT(*) FROM population WHERE age < 27 AND income >= 100000
...	...	...	...	...	...

What I did for now was to use my table [sql_rules] to create all the SQL 'queries'. Since they can often be similar until the third or forth column, I avoided recomputing the same count multiple times and took the values if a particular 'count' had already been computed.

Can someone think of a fastest way to do this ? My solution is working fine but I can't think of a faster way to do this. I'm able to work with sql, Python, R.

Edit : Ideally, what I'm looking for is some nice ideas that could potentially accelerate the process, for the sake of 'interest'. Here are examples of ideas that are not meant to be great, but to exemplify what I am looking for :

Using the sql_rules table, let's say with (~800 rows * ~5 rules in average), 4000 rules, create the 4000 sql queries and run them all one after the other until it's done.
Create the 4000 'queries' as above but once it is done, keep only the unique/distinct rules which would maybe reduce the count to 1000 'SELECT COUNT(*) FROM... .
loop over each line, create a duplicate of the population table and remove from the duplicate table the rows where the 'rules' are not met. That way, each condition/query/rules of the rest of the row will be computed on a smaller and smaller table.

Edit -------------------------------------

The table is unfortunately proprietary but I will create something equivalent using public data as soon as possible.
The sql_rules table is ~800rows and 8 columns (rules). Most of the time, one single row does not contain 8 rules and the a single row is thus looking like that :

row_number	rule_1	rule_2	rule_3	rule_4	rule_5	rule_6	rule_7	rule_8
hair = 'blue'	eyes IN ('green', 'brown')	income BETWEEN 10000 AND 20000	NA	NA	NA	NA	NA	NA

with the values I need to find for this particular row being equivalent to

row_number	rule_1	rule_2	rule_3	rule_4	rule_5	rule_6	rule_7	rule_8
SELECT COUNT(*) FROM population WHERE hair = 'blue'	SELECT COUNT(*) FROM population WHERE hair = 'blue' AND eyes IN ('green', 'brown')	SELECT COUNT(*) FROM population WHERE hair = 'blue' AND eyes IN ('green', 'brown') AND income BETWEEN 10000 AND 20000	0	0	0	0	0	0

CodePudding user response：

You can use conditional sum , still query must be build for each set of rules, but where part can be easily implemented.

select 
  sum(1) filter (where hair_color = 'brown') rule1,
  sum(1) filter (where age < 27) rule2,
  sum(1) filter (where origin in ('US','UK')) rule3
 from population

DB Fiddle