Home > Software engineering >  How to convert this query for Spark SQL
How to convert this query for Spark SQL

Time:01-06

I'm trying to convert an SQL Server query to execute it into a Notebook, but I can't figure out how to convert a "CROSS APPLY" into something that Spark can understand.

Here is my SQL Server query :

WITH Benef as (
    SELECT DISTINCT 
        IdBeneficiaireSource
        ,Adress
    FROM 
        UPExpBeneficiaryStaging 
)

-------- Split Adress --------

,AdresseBenefTemp1 as (
     SELECT 
         IdBeneficiaireSource
         ,REPLACE(REPLACE(Adress, char(10), '|'), char(13), '|') as AdresseV2
     FROM 
        Benef
)


,AdresseBenefTemp2 as (
     SELECT 
        IdBeneficiaireSource
        ,value as Adresse
        ,ROW_NUMBER() OVER(PARTITION BY IdBeneficiaireSource ORDER BY (SELECT NULL)) as LigneAdresse
     FROM 
         AdresseBenefTemp1
         CROSS APPLY string_split(AdresseV2, '|')
)  

,AdresseBenefFinal as (
    SELECT DISTINCT
         a.IdBeneficiaireSource
         ,b.Adresse as Adresse_1
         ,c.Adresse as Adresse_2
         ,d.Adresse as Adresse_3
    FROM 
         AdresseBenefTemp2 as a 
         LEFT JOIN AdresseBenefTemp2 as b on b.IdBeneficiaireSource = a.IdBeneficiaireSource AND b.LigneAdresse = 1
         LEFT JOIN AdresseBenefTemp2 as c on c.IdBeneficiaireSource = a.IdBeneficiaireSource AND c.LigneAdresse = 2
         LEFT JOIN AdresseBenefTemp2 as d on d.IdBeneficiaireSource = a.IdBeneficiaireSource AND d.LigneAdresse = 3
)

-------------------------------

SELECT 
    a.IdBeneficiaireSource
    ,Adresse_1
    ,Adresse_2
    ,Adresse_3
FROM 
    AdresseBenefFinal

(This query split an address field into three address fields)

When I run it into a Notebook, it says that "CROSS APPLY" is not correct.

Thanks.

CodePudding user response:

Correct me if I'm wrong, but the cross apply string_split is basically a cross join for each entry in the resulting split.

In Spark you're able to use an explode for this (https://docs.databricks.com/sql/language-manual/functions/explode.html). So you should be able to add another CTE in between where you explode the splitted (https://docs.databricks.com/sql/language-manual/functions/split.html) results from AddresseV2 by '|'.

  • Related