I'm playing around with SAS (version: 7.11 HF2), I've a dataset which has columns A and B, variable A is decimal. When I run the below code, strangely I get a . (dot) in the first row of output.
Input data:
a, b
2.4, 1
1.2, 2
3.6, 3
Code:
data test;
c = a;
set abcd.test_data;
run;
Output data:
c, a, b
., 2.4, 1
2.4, 1.2, 2
1.2, 3.6, 3
3.6, ,
Strange things:
- Derived variable is always generated on the right side, this one is being generated on left.
- . (dot) is coming and the values are shifting by a row in the derived column.
Any help?
CodePudding user response:
Your set
statement is after your variable assignment statement. SAS is first trying to assign the value of a
to c
, which has not yet been read. Place your set
statement first, then do variable manipulation.
data test;
set abcd.test_data;
c = a;
run;
CodePudding user response:
Nothing strange here, just put the SET
statement before.
Datastep processing consists of 2 phases.
- Compilation Phase
- Execution Phase
During compilation phase, each of the statements within the data step are scanned for syntax errors.
During execution phase, a dataset's data portion is created.
It initializes variables to missing and finally executes other statements in the order determined by their location in the data step.
In your case, the set statement comes after the assignment of c
. At that time a
and b
are set to missing, hence giving a missing value for c
. Finally, the SET
statement will be executed and that is why you end up with a value for both a
and b
on the first line.
data test;
set abcd.test_data;
c = a;
run;
Note that the first variable in your dataset is c
, because this is the first stated in your code.
CodePudding user response:
Looks like it did want you asked it to do.
On the first iteration of the data step it will set C to the value of A. The value of A is missing since you have not yet given it any value. Then the SET statement will read the first observation from your input dataset. Since there is no explicit OUTPUT statement the observation is written when the iteration reaches the end.
On the rest of the iterations of the data step the value that A will have when it is assigned to C will be the value as last read from the input dataset. Any variable that is part of an input dataset is "retained", which really just means it is not set to missing when a new iteration starts.
If the goal was to create C with the previous value of A you could have created the same output by using the LAG() function.
data test;
set abcd.test_data;
c=lag(a);
run;