I am executing this query from a python connection to snowflake and using pyspark like so:
update = """
UPDATE FINAL_TBL x
FROM STG_TBL y
SET x.NAME = y.NAME
WHERE y.customer_id = x.customer_id AND y.end_date = x.end_date
"""
Then I execute this through a pyspark connection to sf like so:
spark.sql(update)
When I hit run I get the following error:
missing 'SET' at 'FROM' (line 3, pos 0)
== SQL ==
UPDATE FINAL_TBL x
^^^
SET x.NAME = y.NAME
WHERE y.customer_id = x.customer_id AND y.end_date = x.end_date
Can someone show what is incorrect about this query?
Thank you
I've tried re-writing the query substituting "USING" instead of "FROM". Extensive google searches dont lead to conclusive results. When I ran these through SQL validations I got back similar errors.
CodePudding user response:
Try to think of the UPDATE command when you are joining to a secondary (or even multiple tables) as a SELECT FIRST. Once you get that query working, change to an update. Ex from your scenario
select
ft.name,
st.name
from
Final_Tbl ft
JOIN stg_tbl st
on ft.customer_id = st.customer_id
AND ft.end_date = st.end_date
Once this is confirmed, you should be able to see the name column from each respective table, so you know you have the join and aliases correct and can see the names. Now simply change to update/set.
update ft set
name = st.name
from
Final_Tbl ft
JOIN stg_tbl st
on ft.customer_id = st.customer_id
AND ft.end_date = st.end_date
CodePudding user response:
You have the SET and FROM clauses out of order:
UPDATE <target_table> SET <col_name> = [ , <col_name> = , ... ] [ FROM <additional_tables> ] [ WHERE ]
Try:
update = """
UPDATE FINAL_TBL x
SET "NAME" = y."NAME"
FROM STG_TBL y
WHERE y.customer_id = x.customer_id AND y.end_date = x.end_date
"""