Below is the Spark SQL which gives the result as shown in the output
select date_part('WEEK', ts), date_format(ts,'yyyy-MM-dd') from threads where ts between
'2020-08-05 00:00:00' and '2020-09-06 00:00:00' order by date_format(ts,'yyyy-MM-dd')
Output
date_part(WEEK, ts) date_format(ts, yyyy-MM-dd)
32 2020-08-07
32 2020-08-07
32 2020-08-07
33 2020-08-12
33 2020-08-12
33 2020-08-12
33 2020-08-12
36 2020-08-31
36 2020-08-31
36 2020-08-31
36 2020-08-31
36 2020-08-31
36 2020-09-01
36 2020-09-02
But when I try to find the week count by grouping data
%sql
select count(date_part('WEEK', ts)), date_part('WEEK', ts), date_format(ts,'yyyy-MM-dd') from
threads where ts between '2020-08-05 00:00:00' and '2020-09-06 00:00:00' group by
date_part('WEEK', ts) order by date_format(ts,'yyyy-MM-dd')
I am getting the below Error,
Error in SQL statement: AnalysisException: expression 'threads.ts' is neither present in the
group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value)
if you don't care which value you get.;
Sort [date_format(ts, yyyy-MM-dd)#2569 ASC NULLS FIRST], true
- Aggregate [date_part(WEEK, ts#1130)], [count(date_part(WEEK, ts#1130)) AS
count(date_part(WEEK, ts))#2567L, date_part(WEEK, ts#1130) AS date_part(WEEK, ts)#2568,
date_format(cast(ts#1130 as timestamp), yyyy-MM-dd, Some(Etc/UTC)) AS date_format(ts, yyyy-MM-
dd)#2569]
Any help with the above issue is highly appreciated
Expected result is
count(date_part('WEEK', ts)) date_part(WEEK, ts) date_format(ts, yyyy-MM-dd)
3 32 2020-08-07
4 33 2020-08-12
7 36 2020-08-31
CodePudding user response:
You need to group by start of weeks:
%sql
select
count(ts) cnt,
date_part('WEEK', date_trunc('WEEK', ts)) as week,
date_format(min(ts),'yyyy-MM-dd') as week_date
from threads
where ts between '2020-08-05 00:00:00' and '2020-09-06 00:00:00'
group by date_trunc('WEEK', ts)
order by date_trunc('WEEK', ts)
--- ---- ----------
|cnt|week| week_date|
--- ---- ----------
| 3| 32|2020-08-07|
| 4| 33|2020-08-12|
| 7| 36|2020-08-31|
--- ---- ----------