I am getting two very different numbers for these seemingly similar queries on (hive) tables:
select count(*) from test
# result: 2609173
select distinct count(*) from test
# result: 2609173
insert into testToo
select distinct * from test
# result: inserted 673065 rows
Any recommendations on how I might be able to discern what is going on? Am I using distinct somehow differently in the first few queries?
CodePudding user response:
You want select count(distinct *) from test
rather than select distinct count(*) from test
The former means "select the count of distinct rows" and the latter means "select the distinct values of count(*)" (and there's only one value so it is semantically the same as select count(*) from test
)