How can I remove duplicates and find row count in a single query in Hive?
CodePudding user response:
I never use Hive before, so I not sure the SQL query is same or not. But you can use this one as reference. I suggest you put the column name with the column has duplicate values.
SELECT DISTINCT COUNT(<column name>) FROM <table>
CodePudding user response:
I have never used Hive before, one more way to use is SELECT COUNT (DISTINCT column name) FROM table
CodePudding user response:
try this
SELECT COUNT(DISTINCT <column name>) FROM <table>
another option is using group by
:
SELECT COUNT(<column name>) FROM <table> GROUP BY <column name>
CodePudding user response:
Example
SELECT column_1, COUNT(DISTINCT(column_2)) FROM table_name GROUP BY column_1