Home > Enterprise >  Count Zero & Non-Zero Bytes in a String (Databricks SQL)
Count Zero & Non-Zero Bytes in a String (Databricks SQL)

Time:06-12

In a byte string, I'm trying to calculate how many bytes are zero 00 or non-zero (i.e. ff)

I've so far found this split / array function, but it has poor preformance (a few mins to run): cardinality(array_remove(array_remove((split(data,'(?<=\\G..)')),'00'),''))-1

Example string 0x13dcfc5900000000000000000000000000000000000000000000003c63db09c669280000000000000000000000000000000000000000000000000000000000003b5e789d00000000000000000000000042000000000000000000000000000000000000420000000000000000000000007f5c764cbc14f9669b88837ca1490cca17c31607000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004234893acac5096f4a1ad8fd952cc98b8c8ff460000000000000000000000000000000000000000000000000000000062a398f9

In postgresql, you could format as escape: Count Zero and/or Non-Zero bytes in a bytea/string

CodePudding user response:

Thanks to unhex function you won't need to bother about splitting.

SELECT length(s) - length(replace(s, chr(0), '')) as length_zero,
       length(replace(s, chr(0), '')) as length_non_zero
  FROM (SELECT decode(unhex('13dcfc590000'), 'US-ASCII') as s) _;

length_zero  length_non_zero
          2                4
  • Related