let's say I have emp_no, year_that_got_paid, gender, amount_paid I want to group by year_that_got_paid and gender, so that I can see per year (1990, 1991...) how many ppl of each gender got paid. However, if the same emp_no appears twice in the same year, it counts it as one more person with that gender, but it's actually the same.
Notes: I simplified the table for the question but it actually has more columns and primary key doesn't help to prevent the duplicate employee IDs for that year.
Therefore, I thought, maybe a kind of
select count(*)
from tableX
group by year_that_got_paid, gender
but then I would need a second group by or some kind of "distinct" to count only once every emp_no, year_that_got_paid. If I group by the 3 fields it doesn't work either as it would greate a lot of rows saying 1, 2 or whatever amount of times the employee received salary that year.
CodePudding user response:
Just do a COUNT DISTINCT on the employee id
CodePudding user response:
Your table stores money people earned. The people can switch genders, so you store the gender along. And they can earn multiple amounts in a year.
So, if Bobby (Roberta / Robert) earned 1000$ as a woman, 500$ as a man, and 500$ as a woman again in 2020, you want this to count as 1 man in 2020 (having earned 500$) and one woman (having earned 1500$).
In order to achieve this, aggregate by employee number and gender in a year first:
select
year_that_got_paid,
count(case when gender = 'male' then 1 end) as male_count,
count(case when gender = 'female' then 1 end) as female_count,
sum(case when gender = 'male' then sum_amount_paid else 0 end) as male_paid,
sum(case when gender = 'female' then sum_amount_paid else 0 end) as female_paid
from
(
select year_that_got_paid, emp_id, gender, sum(amount_paid) as sum_amount_paid
from transgender_payments
group by year_that_got_paid, emp_id, gender
) per_emp_and_their_gender
group by year_that_got_paid
order by year_that_got_paid;