Home > database >  Using cast() inside a select in spark.sql
Using cast() inside a select in spark.sql

Time:11-25

I am trying to do a simple thing: an inner join between two tables, but one of them have a column that was renamed and the data_type is wrong. So I want to use cast() and change the name of the column. I did this:

spark.sql(f'''SELECT nr_cpf_base_srf as nr_cpf,
                 cd_fon_ren,
                 dt_ref_ren,
                 vl_ren,
                 dt_incl_ren_avld,
                 dt_bxa_ren,
                 cd_usu_rsp_atl,
          cast(nr_cpf AS decimal (14,0))
          FROM DB2DFE.REN_AVLD_PF as A
          INNER JOIN sbx_d4n0cbf.pss_cpf_cli_msl as B
          ON a.nr_cpf = b.NR_CPF
       ''').createOrReplaceTempView('temp_x')

The table DB2DFE.REN_AVLD_PF have a column nr_cpf_base_srf which I want to change to nr_cpf so I can build the inner join with sbx_d4n0cbf.pss_cpf_cli_msl. But the data type of nr_cpf_base_srf is nit right: it should be decimal(14,0), thus I used cast(). It raises the error:

SparkStatementException: "cannot resolve '`a.nr_cpf`' given input columns: [B.dt_mvtc, A.dt_bxa_ren, B.NR_CPF,
B.cd_cli, A.dt_incl_ren_avld, A.cd_fon_ren, A.vl_ren, A.cd_usu_rsp_atl, A.dt_ref_ren, A.nr_cpf_base_srf];
line 11 pos 17;

It must be a really simple thing, but I do not know how to do it and could not find any answer. So: what is wrong? please.

CodePudding user response:

ISO SQL (which Apache Spark implements, mostly) does not let you reference other columns or expressions from the same SELECT projection clause.

So you cannot do this:

SELECT
    ( a   123 ) AS b,
    ( b   456 ) AS c
FROM
    someTable

(Arguably, ISO SQL should allow this, as otherwise you need a CTE or outer-query and that will balloon the textual size of your query: the ISO SQL design committee needs to start considering their own ergonomics).

Anyway, change your query to not reference column expressions:

SELECT
    nr_cpf_base_srf AS nr_cpf,
    cd_fon_ren,
    dt_ref_ren,
    vl_ren,
    dt_incl_ren_avld,
    dt_bxa_ren,
    cd_usu_rsp_atl,
    CAST( nr_cpf_base_srf AS decimal (14,0) )
FROM
    DB2DFE.REN_AVLD_PF AS a
    INNER JOIN sbx_d4n0cbf.pss_cpf_cli_msl AS b ON a.nr_cpf = b.NR_CPF

If you really do want to alias the column before you CAST it then you'll need an outer-query:

SELECT
    t.*,
    CAST( t.nr_cpf AS decimal (14,0) )
FROM
    (
        SELECT
            nr_cpf_base_srf AS nr_cpf,
            cd_fon_ren,
            dt_ref_ren,
            vl_ren,
            dt_incl_ren_avld,
            dt_bxa_ren,
            cd_usu_rsp_atl
        FROM
            DB2DFE.REN_AVLD_PF as A
            INNER JOIN sbx_d4n0cbf.pss_cpf_cli_msl AS B ON a.nr_cpf_base_srf = b.NR_CPF
    ) AS t

Or a CTE:

WITH t AS (

    SELECT
        nr_cpf_base_srf AS nr_cpf,
        cd_fon_ren,
        dt_ref_ren,
        vl_ren,
        dt_incl_ren_avld,
        dt_bxa_ren,
        cd_usu_rsp_atl
    FROM
        DB2DFE.REN_AVLD_PF as A
        INNER JOIN sbx_d4n0cbf.pss_cpf_cli_msl AS B ON a.nr_cpf_base_srf = b.NR_CPF
)
SELECT
    t.*,
    CAST( t.nr_cpf AS decimal (14,0) )
FROM
    t
  • Related