Home > Software design >  Difference calculation between two percentages in sas
Difference calculation between two percentages in sas

Time:07-26

I am trying to re-create some analysis that i did in excel. I have application key level data in excel which has a lot of variables. There is a column which is uplift which gets calculated as loans/approved apps. this value I have for both test and control applications. My data looks something like this

demo data in excel

data have;
    infile datalines dlm='|';
    input application_number test_control$ loans approved_apps risk_level$;
    datalines;
1|test|1|1|level 1
2|test|0|1|level 2
3|control|1|1|level 3
4|test|0|0|level 1
5|control|1|1|level 2
6|test|1|1|level 3
7|control|0|1|level 1
;
run;

So basically, using this data I calculate the uplift - which is the calculated as loans/approved apps. and now I calculate another column which is uplift_1 i.e. (uplift of test/uplift of control)-1

calculation of uplift_1 in excel

can someone please tell me how do i perform this in sas?

I am trying to use proc sql statement like this

proc sql;
create table test as select test_control, risk_level, sum(loans), sum(approved_apps) from table xyz
group by test_control, risk_level;
quit;

data test1;
set test;
uplift=loans/approved_apps;
run;

When i am doing this in sas, its not giving me the correct results, how do i calculate the uplift and uplift_1 in sas? please help urgently

CodePudding user response:

The first table is trivial out of proc tabulate.

data test_data;
  input test_control $ loans approved risk $;
  datalines;
test 1 1 lev1
test 0 1 lev2
control 1 1 lev3
test 0 0 lev1
control 1 1 lev2
test 1 1 lev3
control 0 1 lev1
;;;;
run;

proc tabulate data=test_data;
  class test_control risk;
  var loans approved;
  tables (risk=' ' all='Grand Total'),(test_control=' ' all='Grand Total')*(loans=' '*pctsum<approved>=' ')/box='Sum of Uplift';
run;

The <approved> part is a denominator definition. It's unclear if pctn or pctsum is correct from your example data, but seems like pctsum is most likely.

The final column you'd have to compute another way - you can use ods output to get the table from above into a dataset, then use proc transpose to reshape it how you want it; or you can do it a more manual way.

CodePudding user response:

First, give your columns a name:

proc sql;
    create table test as 
        select test_control
             , risk_level
             , sum(loans) as loans
             , sum(approved_apps) as approved_apps
        from have
        group by test_control, risk_level
        order by risk_level
        ;
quit;

Then you can calculate uplift:

data uplift;
    set test;
    uplift=loans/approved_apps;
run;

From here, transpose it so you have your data in a tabular format. Then calculate uplift_1:

proc transpose data=test1 out=uplift_transpose;
    by risk_level;
    id test_control;
    var uplift;
run;

data uplift_1;
    set uplift_transpose;
    uplift_1 = (test/control) - 1;

    format uplift_1 percent.;
    drop _NAME_;
run;

Output:

risk_level  control test    uplift_1
level 1     0       1       .
level 2     1       0       (100%)
level 3     1       1       0%

CodePudding user response:

In your SQL query you failed to provide names but you can also calculate it directly in SQL

proc sql;
create table test as 
select test_control, risk_level, 
sum(loans) as loans, 
sum(approved_apps) as approved_apps, 
calculated loans / calculated approved_apps as uplift
from table xyz
group by test_control, risk_level;
quit;

data test1;
set test;
uplift=loans/approved_apps;
run;
  • Related