I have the following challenge: I have a table called hashtags_users_grouped
which has the following structure:
hashtag_id | user_id | count
123 1 1
245 1 3
123 2 5
In each row, we find values that tell me when a certain user mentioned a certain hashtag and how many times he did it. In this example, user 1 mentioned hashtag 123 one time and 245 three times, while user 2 only mentioned hashtag 123 five times.
I want to do a query that would give me the following output:
user | hashtag_123 | hashtag_245
1 1 3
2 5 0
In other words, the same information as the first table, but with a column per hashtag, to know the number of times a user mentioned each hashtag.
It would be easy to do this using a recursive method (like using a Pyspark data frame and iterating over each hashtag), but I am looking forward to achieving it in a single query. Do you know any way to do this?
EDIT: User #Larnu said I should use PIVOT
. How would you write a query using it? I tried but didn't receive the expected results
CodePudding user response:
Given the column naming you specify and the unknown list of hashtag values, I see you resorting to dynamic SQL.
However, here is a simple PIVOT example based on the data you shared that you may run in SSMS:
DECLARE @hashtags_users_grouped table (
hashtag_id int, [user_id] int, [count] int
);
INSERT INTO @hashtags_users_grouped VALUES
( 123, 1, 1 ), ( 245, 1, 3 ), ( 123, 2, 5 );
SELECT
[user_id],
ISNULL( [123], 0 ) AS hashtag_123,
ISNULL( [245], 0 ) AS hashtag_245
FROM @hashtags_users_grouped AS hug
PIVOT (
MAX ( [count] )
FOR hashtag_id IN ( [123], [245] )
) piv;
RETURNS
--------- ------------- -------------
| user_id | hashtag_123 | hashtag_245 |
--------- ------------- -------------
| 1 | 1 | 3 |
| 2 | 5 | 0 |
--------- ------------- -------------
In a PIVOT
, your column values (e.g. 123, 245 ) get transposed into column headers, hence the FOR hashtag_id IN ( [123], [245] )
part. To do this without dynamic SQL you would have to list a "FOR [column]" for every possible hashtag_id value. Given the unknown size of this list, it would quickly become unmanageable trying to maintain the above code when a new value is introduced. So, dynamic SQL to the rescue.
Depending on your version of SQL Server, this is how I might approach it using Dynamic SQL:
DECLARE @headers varchar(MAX), @in varchar(MAX), @pivot varchar(MAX);
SELECT
@headers = STRING_AGG ( FORMATMESSAGE( 'ISNULL( [%i], 0 ) AS [hashtag_%i]', hashtag_id, hashtag_id ), ', ' ),
@in = STRING_AGG( FORMATMESSAGE( '[%i]', hashtag_id ), ', ' )
FROM ( SELECT DISTINCT hashtag_id FROM hashtags_users_grouped ) AS x;
SET @pivot = FORMATMESSAGE(
'SELECT [user_id], %s FROM hashtags_users_grouped PIVOT ( MAX( [count] ) FOR hashtag_id IN ( %s ) ) AS piv;'
, @headers
, @in
);
PRINT @headers;
PRINT @in;
PRINT @pivot;
PRINT @headers;
ISNULL( [123], 0 ) AS [hashtag_123], ISNULL( [245], 0 ) AS [hashtag_245]
PRINT @in;
[123], [245]
PRINT @pivot;
SELECT [user_id], ISNULL( [123], 0 ) AS [hashtag_123], ISNULL( [245], 0 ) AS [hashtag_245] FROM hashtags_users_grouped PIVOT ( MAX( [count] ) FOR hashtag_id IN ( [123], [245] ) ) AS piv;
And finally, to execute the dynamic SQL:
EXEC ( @pivot );
Note: The dynamic SQL example does not reference the table variable.