Querying Data in databricks spark SQL-CodePudding

I am very new to databricks SQL. And I need some help for a certain task.

I have a table POLICY which has the following columns:

PolicyId	CustomerId	AgentId
P123	C123	A123
P124	C124	A124
P125	C123	A125
P126	C124	A124

I need to determine the policies belonging to the same customer that has a different agent from the other policies from that respective customer.

In this example , policies with policyid P124 and P126 under customer C124 are safe since they have the same agent - agentId A124. But, policies P123 P125 for customer C123 is flagged for having different agents.

Basically, a customer can have more than 1 policy as long as they are under the same agent. Any policies belonging to the same customer but has a different servicing agent is flagged.

How can I achieve this in Databricks SQL? As of now , I have only determined policies belonging to the same customer by using collect_list() function aggregate.

select customerid,collect_list(distinct policyid) from Policy group by customerid

CustomerId	collect_list(PolicyId)
C123	["P123","P125"]
C124	["P124","P126"]

EDIT: Tried the solution below and it worked. But the requirement had a minor change. Using the dataset from the query below, I need now to determine the policies under the same customer, having different agents , and/while these agents belonging on the same group.

I have a look up table for the agents that has a groupcode

AgentId	GroupCode
A123	1
A124	2
A125	2
A126	2

CodePudding user response：

Since you want to get(flag) the customerId where they have at least one policy but different agents, you can write a query using customerId and AgentId itself. I have used the same data given as an example.

The following query gives details of customerID when the count of distinct AgentId is greater than 1.

select customerId, count(distinct(AgentId)) from policy group by customerId

Output

 ---------- ----------------------- 
|customerId|count(DISTINCT AgentId)|
 ---------- ----------------------- 
|      C123|                      2|
|      C124|                      1|
 ---------- -----------------------

Now since we want to flag the customer with multiple agents, you can use the following query to get complete details.

select customerId,collect_list(distinct AgentId),count(policyId),collect_list(distinct policyId) from policy group by customerId having count(distinct(agentId))>1

Output

 ---------- ------------------------------ --------------- ------------------------------- 
|customerId|collect_list(DISTINCT AgentId)|count(policyId)|collect_list(DISTINCT policyId)|
 ---------- ------------------------------ --------------- ------------------------------- 
|      C123|                  [A123, A125]|              2|                   [P123, P125]|
 ---------- ------------------------------ --------------- -------------------------------

UPDATE:

You can create a view using the following query. you can use this view to query and acquire the necessary results.

create view flagged_customers as select customerId,collect_list(distinct AgentId) as distinct_agent_list,count(policyId) as policy_count,collect_list(distinct policyId) as distinct_policy_list from policy group by customerId having count(distinct(agentId))>1

--select * from flagged_customers

Output:

 ---------- ------------------- ------------ -------------------- 
|customerId|distinct_agent_list|policy_count|distinct_policy_list|
 ---------- ------------------- ------------ -------------------- 
|      C123|       [A123, A125]|           2|        [P123, P125]|
 ---------- ------------------- ------------ --------------------

If you want complete row information without arrays so it would be easier to query, you can use inner join on policy table and flagged_customers view as shown below:

select t1.* from policy as t1 inner join flagged_customers as t2 on t1.customerid=t2.customerid

Output:

 -------- ---------- ------- 
|PolicyId|CustomerId|AgentId|
 -------- ---------- ------- 
|    P123|      C123|   A123|
|    P125|      C123|   A125|
 -------- ---------- -------