Home > Back-end >  Consolidate Dataframe
Consolidate Dataframe

Time:07-20

I have a dataframe with claims information.
I want to filter out data which has successful payment without Declining.
Ex

T_ID Card Number Status Amount
1 1234 PAID 100
1 1234 REVERSAL -30
2 1234567 PAID 70

The output of this should be

T_ID Card Number Status Amount
1 1234 PAID 70
2 1234567 PAID 70

Cancelling first 2 transactions as ID and CardNumber is same but it has a corresponding Reversal of -$30

CodePudding user response:

With the provided information, I have made following assumption:

  1. Amount associated with PAID and REVERSAL status should only be considered. Rest to be ignored.
  2. In the final output, there has to be a status column with value set to PAID.

To filter out successful payment without declining, you'll have to provide information about time stamps.

select
  T_ID, CardNumber, "PAID" as Status, SUM(Amount) as Amount
from
  (
    select T_ID, CardNumber,
      case
        when Status = 'PAID' then Amount
        when Status = 'REVERSAL' then Amount
        else 0
      end as Amount
    from
      Claim
  )
group by
  T_ID, CardNumber, Status

It's a very basic implementation but it is as per information provided.

CodePudding user response:

Another approach would be using Common Table Expression (Same can be achieved using Spark SQL)

 -- claim table
 create or replace table tx(
  T_ID number,
  Card_Number number,
  Status text,
  Amount number(10,2)
 );

sample data

 insert into tx values
 (1,1234,'PAID',100.00),
 (1,1234,'REVERSAL',    -30),
 (2,1234567,'PAID',70);

sample query

 with total_cte as (
 select T_ID, 
       Card_Number,
        sum(Amount) as total_balance
 from tx group  by T_ID,Card_Number
 )
 select t.T_ID, t.Card_Number, t.Status, s.total_balance 
 from tx t join total_cte as s
 where t.T_ID = s.T_ID and t.Card_Number = s.Card_Number
 and t.status = 'PAID'
 ;

CodePudding user response:

Another approach would be using Common Table Expression (Same can be achieved using Spark SQL)

 -- claim table
 create or replace table tx(
  T_ID number,
  Card_Number number,
  Status text,
  Amount number(10,2)
 );

sample data

 insert into tx values
 (1,1234,'PAID',100.00),
 (1,1234,'REVERSAL',    -30),
 (2,1234567,'PAID',70);

sample query

 with total_cte as (
 select T_ID, 
       Card_Number,
        sum(Amount) as total_balance
 from tx group  by T_ID,Card_Number
 )
 select t.T_ID, t.Card_Number, t.Status, s.total_balance 
 from tx t join total_cte as s
 where t.T_ID = s.T_ID and t.Card_Number = s.Card_Number
 and t.status = 'PAID'
 ;

Here is the result enter image description here

  • Related