Home > OS >  Rolling Up Customer Data into One Row
Rolling Up Customer Data into One Row

Time:11-23

I've created a query in Apache Spark in hopes of taking multiple rows of customer data and rolls it up into one row, showing what types of products they have open. So data that looks like this:

Customer   Product
1           Savings
1           Checking
1           Auto

Ends up looking like this:

Customer     Product
1            Savings/Checking/Auto

The query currently still has multiple rows. I tried group by, but that doesn't show the multiple products that a customer has, instead, it'll just show one product.

Is there a way to do this is Apache Spark or SQL (which is really similar to apache)? Unfortunately, I don't have MYSQL nor do I think IT will install it for me.

SELECT
"ACCOUNT"."account_customerkey" AS "account_customerkey",
max(
concat(case when Savings=1 then ' Savings'end,
 case when Checking=1 then ' Checking  'end,
 case when CD=1 then ' CD /'end,
 case when IRA=1 then ' IRA /'end,
 case when StandardLoan=1 then ' SL /'end,
 case when Auto=1 then ' Auto /'end,
 case when Mortgage=1 then ' Mortgage /'end,
 case when CreditCard=1 then ' CreditCard 'end)) AS Description

FROM "ACCOUNT" "ACCOUNT" 
inner join (
SELECT 
  "ACCOUNT"."account_customerkey" AS "customerkey",


CASE WHEN "ACCOUNT"."account_producttype"  = 'Savings' THEN 1 ELSE NULL END AS Savings,
CASE WHEN "ACCOUNT"."account_producttype"  = 'Checking' THEN 1 ELSE NULL END AS Checking,
CASE WHEN "ACCOUNT"."account_producttype"  = 'CD' THEN 1 ELSE NULL END AS CD,
CASE WHEN "ACCOUNT"."account_producttype"  = 'IRA' THEN 1 ELSE NULL END AS IRA,
CASE WHEN "ACCOUNT"."account_producttype"  = 'Standard Loan' THEN 1 ELSE NULL END AS StandardLoan,
CASE WHEN "ACCOUNT"."account_producttype"  = 'Auto' THEN 1 ELSE NULL END AS Auto,
CASE WHEN "ACCOUNT"."account_producttype"  = 'Mortgage' THEN 1 ELSE NULL END AS Mortgage,
CASE WHEN "ACCOUNT"."account_producttype"  = 'Credit Card' THEN 1 ELSE NULL END AS CreditCard

FROM "ACCOUNT" "ACCOUNT" 
)a on "account_customerkey" =a."customerkey"
GROUP BY
"ACCOUNT"."account_customerkey"

CodePudding user response:

Please try this.

scala> df.show()
 -------- -------- 
|Customer| Product|
 -------- -------- 
|       1| Savings|
|       1|Checking|
|       1|    Auto|
|       2| Savings|
|       2|    Auto|
|       3|Checking|
 -------- -------- 


scala> df.groupBy($"Customer").agg(collect_list($"Product").as("Product")).select($"Customer",concat_ws(",",$"Product").as("Product")).show(false)
 -------- --------------------- 
|Customer|Product              |
 -------- --------------------- 
|1       |Savings,Checking,Auto|
|3       |Checking             |
|2       |Savings,Auto         |
 -------- --------------------- 


scala>                  

CodePudding user response:

See https://docs.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/collect_list and related functions

You need to use collect_list which is available with SQL or %sql.

%sql 
    select id, collect_list(num) 
      from t1
  group by id

I used my own data, you need to tailor. Just demonstrating in more native SQL form.

  • Related