How to write SQL query without iterations-CodePudding

I have the following challenge: I have a table called hashtags_users_grouped which has the following structure:

hashtag_id  |  user_id  |  count
   123            1          1
   245            1          3
   123            2          5

In each row, we find values that tell me when a certain user mentioned a certain hashtag and how many times he did it. In this example, user 1 mentioned hashtag 123 one time and 245 three times, while user 2 only mentioned hashtag 123 five times.

I want to do a query that would give me the following output:

user  |  hashtag_123  |  hashtag_245
1            1               3
2            5               0

In other words, the same information as the first table, but with a column per hashtag, to know the number of times a user mentioned each hashtag.

It would be easy to do this using a recursive method (like using a Pyspark data frame and iterating over each hashtag), but I am looking forward to achieving it in a single query. Do you know any way to do this?

EDIT: User #Larnu said I should use PIVOT. How would you write a query using it? I tried but didn't receive the expected results

CodePudding user response：

Given the column naming you specify and the unknown list of hashtag values, I see you resorting to dynamic SQL.

However, here is a simple PIVOT example based on the data you shared that you may run in SSMS:

DECLARE @hashtags_users_grouped table (
    hashtag_id int, [user_id] int, [count] int
);

INSERT INTO @hashtags_users_grouped VALUES
    ( 123, 1, 1 ), ( 245, 1, 3 ), ( 123, 2, 5 );

SELECT
    [user_id],
    ISNULL( [123], 0 ) AS hashtag_123,
    ISNULL( [245], 0 ) AS hashtag_245
FROM @hashtags_users_grouped AS hug
PIVOT (
    MAX ( [count] )
    FOR hashtag_id IN ( [123], [245] )
) piv;

RETURNS

 --------- ------------- ------------- 
| user_id | hashtag_123 | hashtag_245 |
 --------- ------------- ------------- 
|       1 |           1 |           3 |
|       2 |           5 |           0 |
 --------- ------------- -------------

In a PIVOT, your column values (e.g. 123, 245 ) get transposed into column headers, hence the FOR hashtag_id IN ( [123], [245] ) part. To do this without dynamic SQL you would have to list a "FOR [column]" for every possible hashtag_id value. Given the unknown size of this list, it would quickly become unmanageable trying to maintain the above code when a new value is introduced. So, dynamic SQL to the rescue.

Depending on your version of SQL Server, this is how I might approach it using Dynamic SQL:

DECLARE @headers varchar(MAX), @in varchar(MAX), @pivot varchar(MAX);

SELECT
    @headers = STRING_AGG ( FORMATMESSAGE( 'ISNULL( [%i], 0 ) AS [hashtag_%i]', hashtag_id, hashtag_id ), ', ' ),
    @in = STRING_AGG( FORMATMESSAGE( '[%i]', hashtag_id ), ', ' )
FROM ( SELECT DISTINCT hashtag_id FROM hashtags_users_grouped ) AS x;

SET @pivot = FORMATMESSAGE( 
    'SELECT [user_id], %s FROM hashtags_users_grouped PIVOT ( MAX( [count] ) FOR hashtag_id IN ( %s ) ) AS piv;'
    , @headers
    , @in
);

PRINT @headers;
PRINT @in;
PRINT @pivot;

PRINT @headers;

ISNULL( [123], 0 ) AS [hashtag_123], ISNULL( [245], 0 ) AS [hashtag_245]

PRINT @in;

[123], [245]

PRINT @pivot;

SELECT [user_id], ISNULL( [123], 0 ) AS [hashtag_123], ISNULL( [245], 0 ) AS [hashtag_245] FROM hashtags_users_grouped PIVOT ( MAX( [count] ) FOR hashtag_id IN ( [123], [245] ) ) AS piv;

And finally, to execute the dynamic SQL:

EXEC ( @pivot );

Note: The dynamic SQL example does not reference the table variable.