Home > Enterprise >  Masking a Substring in Hive Views
Masking a Substring in Hive Views

Time:09-17

I need to create a View on top of a Hive Table, masking data in a particular column. The Table has a column of String Type. The data in that particular column is of JSON structure. I need to mask a value of a particular field say 'ip_address'

{"id":1,"first_name":"john","last_name":"doe","email":"[email protected]","ip_address":"111.111.111.111"}

expected:

{"id":1,"first_name":"john","last_name":"doe","email":"[email protected]","ip_address":null}

These are the few Built-in Hive Functions I have tried, they don't seem to help my cause.

  • mask
  • get_json_object
  • STR_TO_MAP
  • if clause

Also I don't think substring and regexp_Extract are useful here coz the position of the field value is not always predetermined plus I'm not familiar with regex expressions.

PS: Any help is appreciated that would help me avoid writing a new UDF.

CodePudding user response:

regexp_replace:

select regexp_replace(column_name,'"ip_address":".*?"', '"ip_address":null') as column_name will work fine with any position.

You can add any number of optional spaces before and after ::

regexp_replace(column_name,'"ip_address" *: *".*?"', '"ip_address":null')

Regexp '"ip_address" *: *".*?"' meaning:

"ip_address" - literally "ip_address"

* - 0 or more spaces (allowed in json)

: - literally :

* - 0 or more spaces

".*?" - any number of any characters (non-greedy) inside double-quotes.

See also similar question if you want to replace value with some calculated value, for example obfuscate using sha256, not with just null: https://stackoverflow.com/a/54179543/2700344

  • Related