Home > Net >  Clickhouse SELECT Array values that have all elements in a specific order
Clickhouse SELECT Array values that have all elements in a specific order

Time:03-02

I need to find an array that contains all of the values of another array, in the presented order - similar to 'hasString' or 'hasAll' - or an idea of how to go about this.

Example Data

dataCol = [1,2,3,4]

hasSubstr

hasSubstr is close however if the values are not an exact match, they are not a match.

hasSubstr(dataCol, [1,2,4]) will return 0

But I need a 1 here because 1, 2 and 4 are in dataCol in the order of 1 then 2 then 4.

Has All

hasAll is also close however it doesn't care about the order.

hasAll(dataCol, [4,2,1]) will return 1

But I need a 0 here because the order is incorrect.

Function or Query?

Something the equivalent of the 'imaginary' function: hasAllOrdered(dataCol, [1,3,4]) = 1

Or an idea of how to construct a query for this. Maybe a combination of hasAll and some query logic magic?

Edit: To clarify my intended result, I need to run a query to select multiple columns as could be used with a function.

SELECT
    path AS dataCol,
    track
FROM tracks
WHERE time_start > 1645232556
AND { magic here returning rows containing [276,277,279] in dataCol }
LIMIT 10

Query id: cac9b576-193e-475f-98e4-84354bf13af4

┌─dataCol───────────────────────────────────┬──track─┐
│ [211,210,207,205,204]                     │ 413354 │
│ [211,210,207,205,204]                     │ 413355 │
│ [73,74,142,209,277,276,208]               │ 413356 │
│ [73,74,142,209,277,276,208]               │ 413357 │
│ [280,279]                                 │ 413358 │
│ [280,279]                                 │ 413359 │
│ [272,208,276,277,278,346,347,273,206,207] │ 413360 │
│ [208,276,277,278,346,272,273,206,207,347] │ 413361 │
│ [276,277,278,279,348,208,209,141]         │ 413362 │
│ [141,276,208,209,277,278,279,348]         │ 413363 │
└───────────────────────────────────────────┴────────┘

10 rows in set. Elapsed: 0.007 sec. Processed 13.59 thousand rows, 273.88 KB (1.86 million rows/s., 37.49 MB/s.)

Ref: https://clickhouse.com/docs/en/sql-reference/functions/array-functions/

CodePudding user response:

You have 2 arrays [a, b] and [a,,b]

Lets build the second array through indexes of the first (indexOf arrayMap) === [a,d,b] --> [1,0,2], remove zeros d by indexOf ( <> 0) --> [1,2]

Now we need the array only if indexes are grow, otherwise elements is in a wrong order.

arrayDifference == [1,2] -> [0,1]. Now if this array has negative elements then indexes are not grow -- not arrayExists j < 0

create table tracks( dataCol Array(UInt64), track UInt64 ) Engine = Memory;
insert into tracks values 
( [211,210,207,205,204]                    , 413354)
( [211,210,207,205,204]                    , 413355)

( [280,279]                                , 413358)
( [280,279]                                , 413359)
( [272,208,276,277,278,346,347,273,206,207], 413360)
( [208,276,277,278,346,272,273,206,207,347], 413361)
( [276,277,278,279,348,208,209,141]        , 413362)
( [141,276,208,209,277,278,279,348]        , 413363);


select *
from tracks
where hasAll(dataCol,  [276,277,279] as x ) and not arrayExists(j -> j<0, arrayDifference(arrayFilter(i->indexOf(dataCol, i)<>0, x)))

┌─dataCol───────────────────────────┬──track─┐
│ [276,277,278,279,348,208,209,141] │ 413362 │
│ [141,276,208,209,277,278,279,348] │ 413363 │
└───────────────────────────────────┴────────┘
  • Related