I am trying to re-create some analysis that i did in excel. I have application key level data in excel which has a lot of variables. There is a column which is uplift which gets calculated as loans/approved apps. this value I have for both test and control applications. My data looks something like this
data have;
infile datalines dlm='|';
input application_number test_control$ loans approved_apps risk_level$;
datalines;
1|test|1|1|level 1
2|test|0|1|level 2
3|control|1|1|level 3
4|test|0|0|level 1
5|control|1|1|level 2
6|test|1|1|level 3
7|control|0|1|level 1
;
run;
So basically, using this data I calculate the uplift - which is the calculated as loans/approved apps. and now I calculate another column which is uplift_1 i.e. (uplift of test/uplift of control)-1
can someone please tell me how do i perform this in sas?
I am trying to use proc sql statement like this
proc sql;
create table test as select test_control, risk_level, sum(loans), sum(approved_apps) from table xyz
group by test_control, risk_level;
quit;
data test1;
set test;
uplift=loans/approved_apps;
run;
When i am doing this in sas, its not giving me the correct results, how do i calculate the uplift and uplift_1 in sas? please help urgently
CodePudding user response:
The first table is trivial out of proc tabulate
.
data test_data;
input test_control $ loans approved risk $;
datalines;
test 1 1 lev1
test 0 1 lev2
control 1 1 lev3
test 0 0 lev1
control 1 1 lev2
test 1 1 lev3
control 0 1 lev1
;;;;
run;
proc tabulate data=test_data;
class test_control risk;
var loans approved;
tables (risk=' ' all='Grand Total'),(test_control=' ' all='Grand Total')*(loans=' '*pctsum<approved>=' ')/box='Sum of Uplift';
run;
The <approved>
part is a denominator definition. It's unclear if pctn
or pctsum
is correct from your example data, but seems like pctsum
is most likely.
The final column you'd have to compute another way - you can use ods output
to get the table from above into a dataset, then use proc transpose
to reshape it how you want it; or you can do it a more manual way.
CodePudding user response:
First, give your columns a name:
proc sql;
create table test as
select test_control
, risk_level
, sum(loans) as loans
, sum(approved_apps) as approved_apps
from have
group by test_control, risk_level
order by risk_level
;
quit;
Then you can calculate uplift:
data uplift;
set test;
uplift=loans/approved_apps;
run;
From here, transpose it so you have your data in a tabular format. Then calculate uplift_1
:
proc transpose data=test1 out=uplift_transpose;
by risk_level;
id test_control;
var uplift;
run;
data uplift_1;
set uplift_transpose;
uplift_1 = (test/control) - 1;
format uplift_1 percent.;
drop _NAME_;
run;
Output:
risk_level control test uplift_1
level 1 0 1 .
level 2 1 0 (100%)
level 3 1 1 0%
CodePudding user response:
In your SQL query you failed to provide names but you can also calculate it directly in SQL
proc sql;
create table test as
select test_control, risk_level,
sum(loans) as loans,
sum(approved_apps) as approved_apps,
calculated loans / calculated approved_apps as uplift
from table xyz
group by test_control, risk_level;
quit;
data test1;
set test;
uplift=loans/approved_apps;
run;