set if condition value based on a select query in hive SQL-CodePudding

I want to set a new column based on an IF condition in which a value is in a select query. For example,

SELECT
    request_id,
    charge_click_cnt,
    IF(
        uuid IN (
            SELECT
                deviceid
            from
                t1
            where
                dt between '20210908'
                and '20210915'
        ),
        'shop_user',
        'non_shop_user'
    ) as shop_user
FROM
    t2

But it seems some errors like follows.

org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class#failAnalysis:41 org.apache.spark.sql.catalyst.analysis.Analyzer#failAnalysis:91 org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1#apply:316

So I am wondering if there is a better way that set a condition value like the code before.
Thanks in advances.

CodePudding user response：

If spark allows case statements with an exists clause the below should work.

SELECT request_id,
       charge_click_cnt, 
       CASE WHEN EXISTS(SELECT 1
                          FROM t1
                         WHERE dt between '20210908'AND '20210915'
                           AND uuid=deviceid) THEN 'shop_user' 
            ELSE 'non_shop_user' 
        END 
  FROM t2;

CodePudding user response：

If you want to try spark dataFrame api , may be you can try this:

import org.apache.spark.sql.functions._

val df1 = spark.sql("select deviceid as uuid, 1 as tag from t1 where dt between '20210908' and '20210915'")
val df2 = spark.sql("select requst_id, charge_click_cnt, uuid from t2")

val resultDf = df2.join(df1, Seq("uuid"), "left").withColumn("IF", when(col("a") === 1, "shop_user").otherwise("non_shop_user")).drop("tag")