I am very new to databricks SQL. And I need some help for a certain task.
I have a table POLICY which has the following columns:
PolicyId | CustomerId | AgentId |
---|---|---|
P123 | C123 | A123 |
P124 | C124 | A124 |
P125 | C123 | A125 |
P126 | C124 | A124 |
I need to determine the policies belonging to the same customer that has a different agent from the other policies from that respective customer.
In this example , policies with policyid P124 and P126 under customer C124 are safe since they have the same agent - agentId A124. But, policies P123 P125 for customer C123 is flagged for having different agents.
Basically, a customer can have more than 1 policy as long as they are under the same agent. Any policies belonging to the same customer but has a different servicing agent is flagged.
How can I achieve this in Databricks SQL? As of now , I have only determined policies belonging to the same customer by using collect_list() function aggregate.
select customerid,collect_list(distinct policyid) from Policy group by customerid
CustomerId | collect_list(PolicyId) |
---|---|
C123 | ["P123","P125"] |
C124 | ["P124","P126"] |
EDIT: Tried the solution below and it worked. But the requirement had a minor change. Using the dataset from the query below, I need now to determine the policies under the same customer, having different agents , and/while these agents belonging on the same group.
I have a look up table for the agents that has a groupcode
AgentId | GroupCode |
---|---|
A123 | 1 |
A124 | 2 |
A125 | 2 |
A126 | 2 |
CodePudding user response:
Since you want to get(flag) the customerId
where they have at least one policy but different agents, you can write a query using customerId
and AgentId
itself. I have used the same data given as an example.
The following query gives details of customerID
when the count of distinct AgentId
is greater than 1.
select customerId, count(distinct(AgentId)) from policy group by customerId
- Output
---------- -----------------------
|customerId|count(DISTINCT AgentId)|
---------- -----------------------
| C123| 2|
| C124| 1|
---------- -----------------------
Now since we want to flag the customer with multiple agents, you can use the following query to get complete details.
select customerId,collect_list(distinct AgentId),count(policyId),collect_list(distinct policyId) from policy group by customerId having count(distinct(agentId))>1
- Output
---------- ------------------------------ --------------- -------------------------------
|customerId|collect_list(DISTINCT AgentId)|count(policyId)|collect_list(DISTINCT policyId)|
---------- ------------------------------ --------------- -------------------------------
| C123| [A123, A125]| 2| [P123, P125]|
---------- ------------------------------ --------------- -------------------------------
UPDATE:
You can create a view using the following query. you can use this view to query and acquire the necessary results.
create view flagged_customers as select customerId,collect_list(distinct AgentId) as distinct_agent_list,count(policyId) as policy_count,collect_list(distinct policyId) as distinct_policy_list from policy group by customerId having count(distinct(agentId))>1
--select * from flagged_customers
- Output:
---------- ------------------- ------------ --------------------
|customerId|distinct_agent_list|policy_count|distinct_policy_list|
---------- ------------------- ------------ --------------------
| C123| [A123, A125]| 2| [P123, P125]|
---------- ------------------- ------------ --------------------
If you want complete row information without arrays so it would be easier to query, you can use inner join on policy
table and flagged_customers
view as shown below:
select t1.* from policy as t1 inner join flagged_customers as t2 on t1.customerid=t2.customerid
- Output:
-------- ---------- -------
|PolicyId|CustomerId|AgentId|
-------- ---------- -------
| P123| C123| A123|
| P125| C123| A125|
-------- ---------- -------