I want to count what is the number of each age for each gender, basically to see the distribution of age and gender.
A sample of the data is like
state poi_name gender age
aichi starbucks shop E 2 3
aichi starbucks shop G 0 2
aichi starbucks shop G 1 2
chiba starbucks shop A 0 1
chiba starbucks shop D 1 1
chiba starbucks shop A 0 2
tokyo starbucks shop B 2 1
tokyo starbucks shop B 1 0
tokyo starbucks shop C 2 3
tokyo starbucks shop F 1 2
aichi starbucks shop E 1 2
I could get the gender distribution but not the age range, how can I do this.
SELECT
state,
poi_name,
count(gender) all_cnt,
count( gender = '0' or null) as Unknown,
count( gender = '1' or null) as Total_Male,
count( gender = '2' or null) as Total_Female,
count('gender') OVER(PARTITION BY state) AS cnt_for_state
FROM `geo_data_working.hw0160_infoDemographic`
GROUP BY state,poi_name
ORDER BY state,poi_name
age range:
-1:13-17
1:18-24
2:25-34
3:35-44
4:45-
0:unknown
Thanks in advance
add-on: I was asked about my expected results, good point. Unfortunately, I am not sure, this is my first SQL try. Maybe something like bellow or if you have a better advise I will appreciate it
1)
state | poi_name | gender | ageRange0 | ageRangeage1 |
---|---|---|---|---|
Tokyo | ShopA | F | 200 | 100 |
Tokyo | ShopA | M | 100 | 150 |
2)
state | poi_name | F_ageRange0 | F_ageRangeage1 | M_ageRange0 | M_ageRangeage1 |
---|---|---|---|---|---|
Tokyo | ShopA | 200 | 100 | 30 | 9000 |
CodePudding user response:
You want to get counts per age group. Use COUNTIF
to count conditionally:
SELECT
state,
poi_name,
COUNT(gender) AS all_cnt,
COUNTIF(gender = 0) AS unknown,
COUNTIF(gender = 1) AS total_male,
COUNTIF(gender = 2) AS total_female,
SUM(COUNT(*)) OVER (PARTITION BY state) AS cnt_for_state,
COUNTIF(age = -1) AS age_13_to_17,
COUNTIF(age = 1) AS age_18_to_24,
COUNTIF(age = 2) AS age_25_to_34,
COUNTIF(age = 3) AS age_35_to_44,
COUNTIF(age = 4) AS age_above_45,
COUNTIF(age = 0) AS age_unknown
FROM `geo_data_working.hw0160_infoDemographic`
GROUP BY state, poi_name
ORDER BY state, poi_name;
CodePudding user response:
I think the following query get your target of query result:
select res.state,
res.POI_NAME,
res.gender,
res.range,
count(*) as cnt
from (select e.STATE,
e.POI_NAME,
(case
when e.GENDER = 1 then 'male'
else
'female'
end) as gender,
(case
when age = 1 then 'age_13_to_17'
when age = 2 then 'age_25_to_34'
when age = 3 then 'age_35_to_44'
when age = 4 then 'age_above_45'
else 'age_unknown'
end) as range
from d-da-project.geo_data_working.hw0160_infoDemographic e) res
group by res.state,
res.POI_NAME,
res.gender,
res.range;