Home > Net >  How to backfill missing values in Sybase (outer apply and derived tables work in MSSQL but not Sybas
How to backfill missing values in Sybase (outer apply and derived tables work in MSSQL but not Sybas

Time:11-09

The problem

I have some tables with dates and numerical values; some numerical values are missing. I want those missing values to be filled in with the value available at the last date.

I need to do it:

  1. In a Sybase database, to which I only have read access; I can create temporary tables but not any permanent tables nor permanent views
  2. In a Microsoft SQL Server 2019 (Version 15) database, to which I have full access

I have found how to do it in Microsoft SQL Server, but not in Sybase.

UPDATE: If I run select @@version I get

Adaptive Server Enterprise/16.0 SP02 PL08 Instrumented

E.g. I need to go from

 ------- ------- 
| date  | value |
 ------- ------- 
| 1-Nov |   100 |
| 2-Nov |       |
| 3-Nov |       |
| 4-Nov |   110 |
| 5-Nov |       |
| 6-Nov |   105 |
 ------- ------- 

to (see the asterisks):

 ------- -------- 
| date  | value  |
 ------- -------- 
| 1-Nov |   100  |
| 2-Nov |   *100 |
| 3-Nov |   *100 |
| 4-Nov |   110  |
| 5-Nov |   *110 |
| 6-Nov |   105  |
 ------- -------- 

Attempt #1

I found an example here

In SQL Server, it runs without errors but it doesn't update anything. In Sybase, it doesn't run at all. I get:

incorrect syntax near the keyword top

This is a reproducible example

CREATE TABLE #my_test (my_date datetime, my_value float NULL )
go

INSERT INTO #my_test SELECT '1-Nov-2021',100
INSERT INTO #my_test SELECT '2-Nov-2021',NULL
INSERT INTO #my_test SELECT '3-Nov-2021',NULL
INSERT INTO #my_test SELECT '4-Nov-2021',110
INSERT INTO #my_test SELECT '5-Nov-2021',NULL
INSERT INTO #my_test SELECT '6-Nov-2021',105

go


UPDATE #my_test
set my_value = (
                select top 1 b.my_value
                from #my_test b
                where b.my_date < a.my_date and b.my_date = a.my_date and b.my_value is not null
                order by b.my_date desc
                
                )
                
from #my_test a
where a.my_value is null

go

Attempt #2

Outer apply works with Microsoft SQL but it doesn't seem supported in Sybase: in Sybase I get

Incorrect syntax near 'outer'

update #my_test
set my_value = coalesce(pr.my_value, nx.my_value)

from #my_test m

outer apply --next non-null value
(
select top 1 *
from #my_test x
where x.my_value is not null
and x.my_date > m.my_date
order by my_date 
) nx

outer apply -- previous non-null
(select top 1 *
from #my_test x
where x.my_value is not null
and x.my_date < m.my_date
order by my_date desc
) pr

where m.my_value is null

Attempt #3

This was in the same link as before. This code works in SQL Sevrer but Sybase tells me:

You cannot use a derived table in the FROM clause of an UPDATE or DELETE statement

My code:

update #my_test  set #my_test.my_value = tt.NewAmount 
from #my_test t
inner join (
    select my_date, coalesce(min(my_value) over (order by my_date desc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW),
                           min(my_value) over (order by my_date asc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)) NewAmount
    from #my_test t
) tt on tt.my_date = t.my_date
where t.my_value is null

CodePudding user response:

This operation can be performed with some 'basic' sub-query syntax ...

One idea using a select to fill the gap (eg, OP using a select into):

select  mt1.*,isnull(mt1.my_value,
                        (select mt2.my_value
                         from   #my_test mt2
                         where  mt2.my_date = (select   max(my_date)
                                                from    #my_test mt3
                                                where   mt3.my_date < mt1.my_date
                                                and     mt3.my_value is not NULL))) as new_value
from    #my_test mt1
order by mt1.my_date           -- for display purposes
go

 my_date                         my_value    new_value
 ------------------------------- ----------- -----------
             Nov  1 2021 12:00AM         100         100
             Nov  2 2021 12:00AM        NULL         100
             Nov  3 2021 12:00AM        NULL         100
             Nov  4 2021 12:00AM         110         110
             Nov  5 2021 12:00AM        NULL         110
             Nov  6 2021 12:00AM         105         105

Assuming data is already in the #temp table and OP wants to run an update:

update  #my_test
set     my_value = (select      mt2.my_value
                        from    #my_test mt2
                        where   mt2.my_date = (select   max(mt3.my_date)
                                                from    #my_test mt3
                                                where   mt3.my_date < mt1.my_date
                                                and     mt3.my_value is not NULL))

from    #my_test mt1
where   my_value is NULL
go

select * from #my_test
order by my_date
go

 my_date                         my_value
 ------------------------------- -----------
             Nov  1 2021 12:00AM         100
             Nov  2 2021 12:00AM         100
             Nov  3 2021 12:00AM         100
             Nov  4 2021 12:00AM         110
             Nov  5 2021 12:00AM         110
             Nov  6 2021 12:00AM         105

NOTES:

  • it's not clear (to me) what we're supposed to do if the 'first' n rows in the table have my_value=NULL (ie, there's no 'previous' my_value to copy); if this is an issue OP can probably wrap the sub-select in a isnull(<sub-select>,@default_value)
  • above code tested in ASE 16.0 SP04 GA (though this should work on most (all?) versions as we're just using standard sub-query syntax)
  • for largish data volumes the queries will benefit from an index on (my_date [,my_value]); while OP may not be able to add an index to a permanent table they should be able to add an index to #my_test before performing an update (said index should be created after the initial population of the table so as to provide usable stats for my_date)
  • whether an index on #my_test will be used will depend on server and session level settings for statement_cache, literal_autoparam and deferred_name_resolution as well as volume of data in #my_test

CodePudding user response:

Depending on the number of NULLS you need to backfill, you will end up to around 50% of the rows requiring an update.

In all databases I encountered, this situation calls for a CREATE TABLE ... AS SELECT, rather than a mass update. And - in my example, I avoided the use of keywords like DATE or VALUE .

Hence - here's a complete example - using LAST_VALUE( ... IGNORE NULLS) .

DROP TABLE IF EXISTS indata;
CREATE TABLE
indata(dt,val) AS (
          SELECT DATE '1-Nov-2021',100
UNION ALL SELECT DATE '2-Nov-2021',NULL
UNION ALL SELECT DATE '3-Nov-2021',NULL
UNION ALL SELECT DATE '4-Nov-2021',110
UNION ALL SELECT DATE '5-Nov-2021',NULL
UNION ALL SELECT DATE '6-Nov-2021',105
);

DROP TABLE IF EXISTS outdata;
CREATE TABLE outdata AS
SELECT
  dt
, LAST_VALUE(val IGNORE NULLS) OVER(ORDER BY dt) AS val
FROM indata
;

ALTER TABLE indata RENAME TO indata_old;
-- sp_rename indata, indata_old;
ALTER TABLE outdata  RENAME TO indata;
-- sp_rename indata, indata; -- do you rename tables like this in Sybase?
SELECT * FROM indata;
-- out      dt     | val 
-- out ------------ -----
-- out  2021-11-01 | 100
-- out  2021-11-02 | 100
-- out  2021-11-03 | 100
-- out  2021-11-04 | 110
-- out  2021-11-05 | 110
-- out  2021-11-06 | 105
  • Related