I have a table on which some data is missing, so I have to replace the missing data by the data of the day before.
I wanted to do a SQL update to solve this issue.
In case the following criteria is met -> if ID1 is in set(a1,a2,a3) AND TYPE Is MISSING
The variables "Amount" a/b will have an absurd value
We will then take the Amount a/b value of the row of the previous day, where ID1 and ID2 are the same as the one of the row where the criteria is met.
So here ID1 and ID2 are equal to a1 and b1 respectively, we look for a1 and b1 the previous day (10/03/2021) and get the amount 28.45/29.46 that we use to replace the false amount 454848.25/548926.36.
We also copy Type value.
ID1 | ID2 | Amount a | Amount b | day | Type |
---|---|---|---|---|---|
a1 | b1 | 28.45 | 29.46 | 10/03/2021 | Out |
a2 | b1 | 36.84 | 37.88 | 10/03/2021 | In |
a1 | b1 | 454848.25 | 548926.36 | 11/03/2021 | /MISSING/ |
Goal:
ID1 | ID2 | Amount a | Amount b | day | Type |
---|---|---|---|---|---|
a1 | b1 | 28.45 | 29.46 | 10/03/2021 | Out |
a2 | b1 | 36.84 | 37.88 | 10/03/2021 | In |
a1 | b1 | 28.45 | 29.46 | 11/03/2021 | Out |
My table consists of thousands of rows, but it's the idea
I tried to use lag and SQL update, but did not succeed.
CodePudding user response:
If the condition is met, replace absurd values by proper missing values and assign a pointer to the day before. Then lookup using the initial table.
data have;
infile datalines delimiter='|';
input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
format day ddmmyy10.;
datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;
data stage1;
set have;
if ID1 in ('a1','a2','a3') and type = "" then do;
Amount_A = .;
Amount_B = .;
_date = day - 1;
type = "Out";
end;
format _date ddmmyy10.;
run;
data want;
if 0 then set have;
if _n_ = 1 then do;
declare hash h(dataset:'have');
h.definekey('ID1','ID2','day');
h.definedata('Amount_A','Amount_B');
h.definedone();
end;
set stage1;
rc = h.find(key:ID1, key:ID2, key: _date);
drop rc _date;
run;
ID1 ID2 Amount_A Amount_B day type
a1 b1 28.45 29.46 10/03/2021 Out
a2 b1 36.84 37.88 10/03/2021 In
a1 b1 28.45 29.46 11/03/2021 Out
CodePudding user response:
Why not just sort by the keys and date and then use the lag
function to look one row back. Maybe you already tried this, but used the lag
only inside the if type is missing
block. This won't help you as noted in the