I've created a query in Apache Spark in hopes of taking multiple rows of customer data and rolls it up into one row, showing what types of products they have open. So data that looks like this:
Customer Product
1 Savings
1 Checking
1 Auto
Ends up looking like this:
Customer Product
1 Savings/Checking/Auto
The query currently still has multiple rows. I tried group by, but that doesn't show the multiple products that a customer has, instead, it'll just show one product.
Is there a way to do this is Apache Spark or SQL (which is really similar to apache)? Unfortunately, I don't have MYSQL nor do I think IT will install it for me.
SELECT
"ACCOUNT"."account_customerkey" AS "account_customerkey",
max(
concat(case when Savings=1 then ' Savings'end,
case when Checking=1 then ' Checking 'end,
case when CD=1 then ' CD /'end,
case when IRA=1 then ' IRA /'end,
case when StandardLoan=1 then ' SL /'end,
case when Auto=1 then ' Auto /'end,
case when Mortgage=1 then ' Mortgage /'end,
case when CreditCard=1 then ' CreditCard 'end)) AS Description
FROM "ACCOUNT" "ACCOUNT"
inner join (
SELECT
"ACCOUNT"."account_customerkey" AS "customerkey",
CASE WHEN "ACCOUNT"."account_producttype" = 'Savings' THEN 1 ELSE NULL END AS Savings,
CASE WHEN "ACCOUNT"."account_producttype" = 'Checking' THEN 1 ELSE NULL END AS Checking,
CASE WHEN "ACCOUNT"."account_producttype" = 'CD' THEN 1 ELSE NULL END AS CD,
CASE WHEN "ACCOUNT"."account_producttype" = 'IRA' THEN 1 ELSE NULL END AS IRA,
CASE WHEN "ACCOUNT"."account_producttype" = 'Standard Loan' THEN 1 ELSE NULL END AS StandardLoan,
CASE WHEN "ACCOUNT"."account_producttype" = 'Auto' THEN 1 ELSE NULL END AS Auto,
CASE WHEN "ACCOUNT"."account_producttype" = 'Mortgage' THEN 1 ELSE NULL END AS Mortgage,
CASE WHEN "ACCOUNT"."account_producttype" = 'Credit Card' THEN 1 ELSE NULL END AS CreditCard
FROM "ACCOUNT" "ACCOUNT"
)a on "account_customerkey" =a."customerkey"
GROUP BY
"ACCOUNT"."account_customerkey"
CodePudding user response:
Please try this.
scala> df.show()
-------- --------
|Customer| Product|
-------- --------
| 1| Savings|
| 1|Checking|
| 1| Auto|
| 2| Savings|
| 2| Auto|
| 3|Checking|
-------- --------
scala> df.groupBy($"Customer").agg(collect_list($"Product").as("Product")).select($"Customer",concat_ws(",",$"Product").as("Product")).show(false)
-------- ---------------------
|Customer|Product |
-------- ---------------------
|1 |Savings,Checking,Auto|
|3 |Checking |
|2 |Savings,Auto |
-------- ---------------------
scala>
CodePudding user response:
See https://docs.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/collect_list and related functions
You need to use collect_list
which is available with SQL or %sql.
%sql
select id, collect_list(num)
from t1
group by id
I used my own data, you need to tailor. Just demonstrating in more native SQL form.