So I have this:
Initial database:
Variable1, variable2, value, percentvalue
Keyword1, a, 234, 0.7
Keyword1, a, 64, 0.18
Keyword1, a, 4, 0.05
Keyword1, a, 2, 0.025
Keyword1, a, 300, 0.84
Keyword2
Keyword2
Keyword3
Keyword4
Keyword4
and so on.
When I run this individually, it work:
data Filename1;
set filename0;
if variable1 = 'Keyword1' then do;
retain sumCol;
sumCol = sum(sumCol, percentvalue);
if sumCol>0.95 then DELETE;
output;
end;
This return the first 3 row of keyword1
Which is what I want.
But when I try to do it for the entire table which has like 600 keywords.
I'm currently running the test with only one keyword to make sure it work in the same way.
But when I run:
data Filename1;
set filename0;
array MyArrayVariable1{1} $ Keyword1;
do i=1 to dim(MyArrayVariable1);
if variable1 = MyArrayVariable1[i] then do;
retain sumCol;
sumCol = sum(sumCol, percentvalue);
if sumCol>0.95 then DELETE;
output;
end;
end;
run;
When I run it, It just pull an empty table instead of the selected value.
And if I get rid of the output; it pulls the entire table without filtering anything.
Anyone can help me with this?
CodePudding user response:
Looks like you just want to use BY group processing.
data Filename1;
set filename0;
by variable1 ;
if first.variable1 then sumcol=0;
sumCol percentvalue;
if sumCol<=0.95 then output;
run;
Note that using a SUM statement
sumCol percentvalue;
is a simplified way to code these two statements in your original code.
retain sumCol;
sumCol = sum(sumCol, percentvalue);
CodePudding user response:
BY group processing with an I/O criterion based on a groupwise computation can also be succinctly coded in what is commonly called a DOW loop in the SAS community. One hallmark of the technique is to place the SET statement inside a DO loop.
Example:
data want;
do until (last.variable1);
SET have;
by variable1;
pctsum = sum(pctsum,percentvalue);
if pctsum <= 0.95 then OUTPUT;
end;
run;
NOTE:
I'm not sure of the role of your Variable2
. Should it be part of a hierarchy wherein the pctsum is reset if the Variable2 value changes within a Variable1 group?