Home > OS >  Replacing Everything Except specific pattern BigQuery
Replacing Everything Except specific pattern BigQuery

Time:10-05

I would like to use regex to replace everything (except a specific pattern) with empty string in BigQuery. I have following values:

AX/88/8888888
AX/99/999999
AX/11/222222 - AX/22/33333 - AX/999/99999
BX/99/9999
1234455121
AX/00/888888 // BX/890/90890
NULL
[XYZ-ASA
BX/890/90890   AX/10/1010101
AX/99/9999M
AX/111/111,AX-99
AX/11/222222 BX/99/99 AX/22/33333

The pattern will always have "AX" in the beginning, then a slash (/) and some numbers and slash(/) again and some numbers after it. (The pattern would always be AX/\d /\d )

I would like to replace anything (any character,brackets,digit etc) that doesn't follow that pattern mention above.

For the cases where the pattern doesn't match at all for example (BX/99/9999,1234455121, NULL,[XYZ-ASA) are the only cases from the above dataset.

** doesn't match at all means cases where the entire values doesn't have any value that matches with the AX/\d /\d . In those situations, I would like to return then original text as final output.

The case where we have matching pattern for example AX/00/888888 // BX/890/90890, AX/111/111,AX-99 the pattern matches but the latter part needs to be replaced i.e [// BX/890/90890] and [,AX-99] , which should then return only the AX/00/888888, and AX/111/111 as final output.

The expected output from the above example is following:

AX/88/8888888
AX/99/999999
AX/11/222222 AX/22/33333 AX/999/99999
BX/99/9999
1234455121
AX/00/888888 
NULL
[XYZ-ASA
AX/10/1010101
AX/99/9999
AX/111/111
AX/11/222222 AX/22/33333

Later I would like to split all the values by space, to get each AX/xx/xx on a different row where I have multiple of those for example case 3 from above would produce 3 rows.

AX/88/8888888
AX/99/999999
AX/11/222222 
AX/22/33333 
AX/999/99999
BX/99/9999
1234455121
AX/00/888888 
NULL
[XYZ-ASA
AX/10/1010101
AX/99/9999
AX/111/111
AX/11/222222
AX/22/33333

CodePudding user response:

Use below

select coalesce(result, col) as col 
from your_table
left join unnest(regexp_extract_all(col, r'AX/\d /\d ')) result     

if applied to sample data in your question

enter image description here

output is

enter image description here

  • Related