I stuck with a spark.sql error that I couldn't solve with answers in stackoverflow, the point is I tried "first_value, collected_list" and they not solving error, and if I group by data with this column logic will be false. could you please help me? this is the code and error.
%spark03.sql select date_key, CASE WHEN (gprs_usage>0 and sms_count_on_net=0 and sms_no_off_net=0) then count(distinct(numbers)) end, sum((gprs_usage)/(1048576)) as data_mb, sum(sms_count_on_net sms_no_off_net) as sms_total_cnt, count(distinct(numbers)) as uniq_total_number
from daily_total_revenue where date_key >= 20220101 and date_key <= 20220120 group by date_key
error: org.apache.spark.sql.AnalysisException: expression 'daily_total_revenue.gprs_usage
' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;
CodePudding user response:
Trying aggregating the CASE
expression:
SELECT
date_key,
COUNT(DISTINCT CASE WHEN gprs_usage > 0 AND sms_count_on_net = 0 AND
sms_no_off_net = 0
THEN numbers END),
SUM(gprs_usage / 1048576) AS data_mb,
SUM(sms_count_on_net sms_no_off_net) AS sms_total_cnt,
COUNT(DISTINCT numbers) AS uniq_total_number
FROM daily_total_revenue
WHERE date_key BETWEEN 20220101 AND 20220120
GROUP BY date_key;
Note that DISTINCT
is not a function in SQL.