Suppose you have a database table named db_table1 with hundreds of columns and most of them contain lots of 0's.
I'm trying to count zeros for each column and divide it by the length of column so I can see the ratio of zeros in each column.
The below code gives me the ratio of nulls in each column. But I couldn't revise it to count 0's instead of nulls
Any help will be appreciated
SELECT column_name, (num_nulls/ (SELECT COUNT(*) FROM db_table)*100)
FROM all_tab_columns
WHERE table_name='table’
Sample data:
col_1 | col_2 | col_3 |
---|---|---|
0.000 | 0.000 | 0.000 |
0.000 | 0.000 | 14.857 |
3.548 | 0.000 | 0.000 |
0.000 | 0.000 | 0.000 |
0.000 | 0.000 | 0.000 |
0.000 | 11.586 | 0.000 |
Expected Output:
All_Columns | Ratio_of_Zeros |
---|---|
col_1 | 65.5 |
col_2 | 73.5 |
col_3 | 48.6 |
Briefly I need the total number of zeros in every column divided by total row number which gives me the percent or ratio of zeros in each column so I can treat zeros like null and eliminate the column if it has more than 60 percent zeros lets say...
CodePudding user response:
You can use:
WITH AA AS (
select
col_1,col_2,col_3
from table_name
)
SELECT column_name, COUNT(column_name) zeros FROM
(
SELECT * FROM (
SELECT * FROM AA
UNPIVOT(
zeros
FOR column_name
IN (
col_1,col_2,col_3
)
)
) WHERE zeros = 0
) GROUP BY column_name
CodePudding user response:
You can use:
SELECT 'SELECT ''' || table_name || ''' AS table_name, ' ||
'''' || column_name || ''' AS column_name, ' ||
'100 - SUM(LENGTH(REPLACE(TO_CHAR("'|| column_name || '", ''fm999999999999999990.000''), ''0''))-1)' ||
' / SUM(LENGTH(TO_CHAR("'|| column_name || '", ''fm999999999999999990.000''))-1) * 100 AS ratio_of_zeroes ' ||
'FROM "' || table_name || '" UNION ALL'
column_name
FROM user_tab_columns
WHERE data_type = 'NUMBER';
To generate an SQL statement and remove the UNION ALL
from the last line and execute it.
If you have the table:
CREATE TABLE table_name (col_1, col_2, col_3) AS
SELECT 0.000, 0.000, 0.000 FROM DUAL UNION ALL
SELECT 0.000, 0.000, 14.857 FROM DUAL UNION ALL
SELECT 3.548, 0.000, 0.000 FROM DUAL UNION ALL
SELECT 0.000, 0.000, 0.000 FROM DUAL UNION ALL
SELECT 0.000, 0.000, 0.000 FROM DUAL UNION ALL
SELECT 0.000, 11.586, 0.000 FROM DUAL;
Then the statement generated (with the final UNION ALL
removed) is:
SELECT 'TABLE_NAME' AS table_name, 'COL_1' AS column_name, 100 - SUM(LENGTH(REPLACE(TO_CHAR("COL_1", 'fm999999999999999990.000'), '0'))-1) / SUM(LENGTH(TO_CHAR("COL_1", 'fm999999999999999990.000'))-1) * 100 AS ratio_of_zeroes FROM "TABLE_NAME" UNION ALL
SELECT 'TABLE_NAME' AS table_name, 'COL_2' AS column_name, 100 - SUM(LENGTH(REPLACE(TO_CHAR("COL_2", 'fm999999999999999990.000'), '0'))-1) / SUM(LENGTH(TO_CHAR("COL_2", 'fm999999999999999990.000'))-1) * 100 AS ratio_of_zeroes FROM "TABLE_NAME" UNION ALL
SELECT 'TABLE_NAME' AS table_name, 'COL_3' AS column_name, 100 - SUM(LENGTH(REPLACE(TO_CHAR("COL_3", 'fm999999999999999990.000'), '0'))-1) / SUM(LENGTH(TO_CHAR("COL_3", 'fm999999999999999990.000'))-1) * 100 AS ratio_of_zeroes FROM "TABLE_NAME";
And the output of executing that is:
TABLE_NAME COLUMN_NAME RATIO_OF_ZEROES TABLE_NAME COL_1 83.33333333333333333333333333333333333333 TABLE_NAME COL_2 80 TABLE_NAME COL_3 80
db<>fiddle here