Home > database >  SQL How to make nested query with substring/case statement/ trim
SQL How to make nested query with substring/case statement/ trim

Time:08-17

New to SQL and trying to understand nested queries and how to use them. I have a substring, case statement, and trim statement that I'm trying to put together but am unsure of how. The substring has to be done first, then the case statement, then the trim. This is what I have at the moment but unsure of how to get it working. The code is random names/tables as an example

SELECT dtXYZ.*
FROM 
(
    SELECT dt,
    SUBSTRING_INDEX(SUBSTRING_INDEX(dt, ..................... ) as lioness,
    SUBSTRING_INDEX(SUBSTRING_INDEX(dt, .....................) as tiger,
    SUBSTRING_INDEX(dt, .................) as bear  
    FROM Animaltab
) dtXYZ

SELECT 
CASE WHEN length(bear) = 4 THEN bear
    ELSE concat('0', bear)
    END AS bear_corr,
CASE WHEN length(lion) = 7 THEN lioness
    ELSE concat('0', lioness)
    END AS lion_corr

    
trim(lion_corr) || '_' || trim(tiger) || '_' || trim(bear_corr) as new_imp_animal 

CodePudding user response:

Spark supports CTE https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-cte.html

even with databrics this will work see Common Table Expressions (CTEs) in Databricks and Spark

ANd you can nest them like this

WITH dtXYZ(dt,lioness,tiger,bear()  AS  (    SELECT dt,
    SUBSTRING_INDEX(SUBSTRING_INDEX(dt, ..................... ) as lioness,
    SUBSTRING_INDEX(SUBSTRING_INDEX(dt, .....................) as tiger,
    SUBSTRING_INDEX(dt, .................) as bear  
    FROM Animaltab),
dtcorrected (dt,bear_corr,lion_corr,tiger) as (
SELECT
    dt,
    CASE WHEN length(bear) = 4 THEN bear
         ELSE concat('0', bear)
    END AS bear_corr,
    CASE WHEN length(lion) = 7 THEN lioness
        ELSE concat('0', lioness)
    END AS lion_corr
    ,tiger
FROM dtXYZ)
SELECT
dt,
trim(lion_corr) || '_' || trim(tiger) || '_' || trim(bear_corr) as new_imp_animal  FROM dtcorrected

CodePudding user response:

Order of operations can be tricky with SQL if you're used to ordering things in a procedure. Like nbk commented, CTEs or Common Table Expressions are your best bet. CTEs are defined by the 'with' keyword and are very similar to nested subqueries (you could write the same query nested if you wanted) but are better suited to this operation where the nesting structure of the code doesn't mimic the nesting of the data. I always use CTEs if I'm joining tables that each need independent grouping or filtering. The SQL in the parenthesis essentially creates a view, and the outside SQL is a second higher-order select statement to create a result set. If I'm working with hierarchical data (parent, child, grandchild), I'll go with the nesting in the query to follow that path, but usually, the CTE is easier to organize your ideas. Here's how that would work:

with dtXYZ as

(
    SELECT dt,
    SUBSTRING_INDEX(SUBSTRING_INDEX(dt, ..................... ) as lioness,
    SUBSTRING_INDEX(SUBSTRING_INDEX(dt, .....................) as tiger,
    SUBSTRING_INDEX(dt, .................) as bear  
    FROM Animaltab
)

SELECT 
CASE WHEN length(bear) = 4 THEN bear
    ELSE concat('0', bear)
    END AS bear_corr,
CASE WHEN length(lion) = 7 THEN lioness
    ELSE concat('0', lioness)
    END AS lion_corr,
trim(lion_corr) || '_' || trim(tiger) || '_' || trim(bear_corr) as new_imp_animal 
from
dtXYZ

And in terms of 'order of operations,' case statements and functions in a select can be referenced by other parts of the select statement as inputs. Things can get hairy when you use 'if' ideas that resolve to illogical or error-causing conditions. Still, otherwise, I've had no issues with having many parts of a select refer to each other. It's an excellent way to test out nesting functions.

  • Related