I have the following dataset in Hive, and I would like to transpose rows into columns.
Customer | Status | Quantity |
---|---|---|
25 | Paid | 5 |
25 | N Paid | 2 |
67 | Open | 12 |
67 | Paid | 4 |
45 | N Paid | 3 |
45 | Open | 2 |
I would like to have a new table after transpose that shows only one line by a customer and multiple columns by Status, e.g.
Customer | Paid | N Paid | Open |
---|---|---|---|
25 | 5 | 2 | 0 |
67 | 4 | 0 | 12 |
45 | 0 | 3 | 2 |
I tried some examples I've found on the Internet, but I could not make it works. Here, for the sake of simplicity, I listed only three statuses, but in fact, I could have more than that.
In SAS, I used to did something such as the following:
proc transpose
data = imputtable;
out = outputtable;
by customer;
id status;
var quantity;
run;
SAS gets all the existing statuses and pivots them into columns. I was looking to do the same in Hive.
Regards,
Marcio
CodePudding user response:
Use conditional aggregation:
select Customer,
sum(case when Status = 'Paid' then Quantity else 0 end) as Paid ,
sum(case when Status = 'N Paid' then Quantity else 0 end) as `N Paid` ,
sum(case when Status = 'Open' then Quantity else 0 end) as Open
from table
group by Customer