I need to create a View on top of a Hive Table, masking data in a particular column. The Table has a column of String Type. The data in that particular column is of JSON structure. I need to mask a value of a particular field say 'ip_address'
{"id":1,"first_name":"john","last_name":"doe","email":"[email protected]","ip_address":"111.111.111.111"}
expected:
{"id":1,"first_name":"john","last_name":"doe","email":"[email protected]","ip_address":null}
These are the few Built-in Hive Functions I have tried, they don't seem to help my cause.
- mask
- get_json_object
- STR_TO_MAP
- if clause
Also I don't think substring and regexp_Extract are useful here coz the position of the field value is not always predetermined plus I'm not familiar with regex expressions.
PS: Any help is appreciated that would help me avoid writing a new UDF.
CodePudding user response:
regexp_replace:
select regexp_replace(column_name,'"ip_address":".*?"', '"ip_address":null') as column_name
will work fine with any position.
You can add any number of optional spaces before and after :
:
regexp_replace(column_name,'"ip_address" *: *".*?"', '"ip_address":null')
Regexp '"ip_address" *: *".*?"'
meaning:
"ip_address"
- literally "ip_address"
*
- 0 or more spaces (allowed in json)
:
- literally :
*
- 0 or more spaces
".*?"
- any number of any characters (non-greedy) inside double-quotes.
See also similar question if you want to replace value with some calculated value, for example obfuscate using sha256, not with just null: https://stackoverflow.com/a/54179543/2700344