Home > Blockchain >  Data cleaning in BigQuery: When session_id has both null and valid find_term, only keeping the recor
Data cleaning in BigQuery: When session_id has both null and valid find_term, only keeping the recor

Time:05-10

I need some help cleaning my data. I have a table like below:

session_id find_term
11111 null
11111 candy
11111 chocolate
22811 null
11145 null
333222 dark chocolate

When session_id has both null and valid find_term, I want to keep the records that only have valid find_terms (eg: for session_id 11111, I want to exclude the first record when find_term is null and keep when find_term is candy and chocolate. For session_id 22811, 11145, I would want to keep all existed records because they only have null as find_term). I want the output to be:

session_id find_term
11111 candy
11111 chocolate
22811 null
11145 null
333222 dark chocolate

So far I have tried giving same group of records row_num to distinct them, but that is not too helpful. Please share your thoughts! Thanks!

CodePudding user response:

Use below

select * from your_table
qualify ((find_term is null ) and countif(not find_term is null) over(partition by session_id) = 0)
or not find_term is null             

if applied to sample data in your question - output is

enter image description here

  • Related