Home > Back-end >  How to delete rows where most columns are empty in postgresql?
How to delete rows where most columns are empty in postgresql?

Time:07-18

I am new to SQL, so please forgive if this question is dumb.

I have a table with 800 columns. I want to delete all rows in which > 400 columns are null. How do I do that?

CodePudding user response:

You can try to covert the entire row plus non-null const to a JSON-like string and count the number of '":null,' substrings. I assume id is a key so this will return a number of NULLs for id

create table foo (
    id int,
    c1 int,
    c2 int
);

insert into foo 
  values
  (1,10, 5),
  (2,null,15)
;

select id, (char_length(c) - char_length(REPLACE(c, '":null,', ''))) / char_length('":null,') cnt
from (
  select  id, json_agg(t)::text c
  from (
    select *, 1
    from foo
  ) t
  group by id
) t
order by id

Returns

id  cnt
1   0
2   1

CodePudding user response:

One way to do it, is to usenum_nulls() function:

delete from wide_table
where num_nulls(col1, col2, col3, col4, ...., col800) > 400;

As you only have to write the number of columns once, this should be the easiest way to do it. Typically graphical SQL clients will help you auto-complete the column names, so that you don't have to type them out manually.


If you don't want to type the column names (or your SQL client doesn't support it), you can using some JSON transformations to get the number of non-null values:

select *
from wide_table wt
where jsonb_array_length(jsonb_path_query_array(jsonb_strip_nulls(to_jsonb(wt)), '$.*')) <= 400

jsonb_strip_nulls(to_jsonb(wt)) will generate a JSON value with all keys (=column names) removed where the value (=column values) are NULL. So that only contains the non-null values. jsonb_path_query_array(..., '$.*') then builds an array of those non-null values and jsonb_array_length counts the number.

Note that the condition needs to be inverted because jsonb_array_length() returns the number of non-null values.

  • Related