Home > Mobile >  SQL: Distinct and Distinct count(*) With Hive Tables
SQL: Distinct and Distinct count(*) With Hive Tables

Time:06-11

I am getting two very different numbers for these seemingly similar queries on (hive) tables:

select count(*) from test
# result: 2609173
select distinct count(*) from test
# result: 2609173
insert into testToo
select distinct * from test
# result: inserted 673065 rows

Any recommendations on how I might be able to discern what is going on? Am I using distinct somehow differently in the first few queries?

CodePudding user response:

You want select count(distinct *) from test rather than select distinct count(*) from test

The former means "select the count of distinct rows" and the latter means "select the distinct values of count(*)" (and there's only one value so it is semantically the same as select count(*) from test)

  • Related