I want to find string that contains any chinese characters.
I have the following query in PostgreSQL which works as expected.
with tmp as (
select '中文zz' as word
union all
select '中文' as word
union all
select 'english' as word
union all
select 'にほんご' as word
union all
select 'eng–lish' as word
)
select word,
word ~* '[\x4e00-\x9fff\x3400-\x4dbf]'
from tmp
Results:
中文zz true
中文 true
english false
にほんご false
eng–lish false
However, if I convert this SQL in BigQuery, it does not produce the same result.
with tmp as (
select '中文zz' as word
union all
select '中文' as word
union all
select 'english' as word
union all
select 'にほんご' as word
union all
select 'eng–lish' as word
)
select word,
regexp_contains(word, r'[\x4e00-\x9fff\x3400-\x4dbf]')
from tmp
Results:
中文zz true
中文 false
english true
にほんご false
eng–lish true
CodePudding user response:
You can use the following regex with BigQuery
:
with tmp as (
select '中文zz' as word
union all
select '中文' as word
union all
select 'english' as word
union all
select 'にほんご' as word
union all
select 'eng–lish' as word
)
select word,
regexp_contains(word, '''[\u4E00-\u9FA5]''')
from tmp
The result is :