spark-sql error column is neither present in the group by, nor is it an aggregate function can'-CodePudding

I stuck with a spark.sql error that I couldn't solve with answers in stackoverflow, the point is I tried "first_value, collected_list" and they not solving error, and if I group by data with this column logic will be false. could you please help me? this is the code and error.

%spark03.sql select date_key, CASE WHEN (gprs_usage>0 and sms_count_on_net=0 and sms_no_off_net=0) then count(distinct(numbers)) end, sum((gprs_usage)/(1048576)) as data_mb, sum(sms_count_on_net sms_no_off_net) as sms_total_cnt, count(distinct(numbers)) as uniq_total_number

from daily_total_revenue where date_key >= 20220101 and date_key <= 20220120 group by date_key

error: org.apache.spark.sql.AnalysisException: expression 'daily_total_revenue.gprs_usage' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;;

CodePudding user response：

Trying aggregating the CASE expression:

SELECT
    date_key,
    COUNT(DISTINCT CASE WHEN gprs_usage > 0 AND sms_count_on_net = 0 AND
                             sms_no_off_net = 0
                        THEN numbers END),
    SUM(gprs_usage / 1048576) AS data_mb,
    SUM(sms_count_on_net   sms_no_off_net) AS sms_total_cnt,
    COUNT(DISTINCT numbers) AS uniq_total_number
FROM daily_total_revenue
WHERE date_key BETWEEN 20220101 AND 20220120
GROUP BY date_key;

Note that DISTINCT is not a function in SQL.