Home > Software design >  How to write query using Joins and Group by
How to write query using Joins and Group by

Time:11-20

I have a SQL Server Database where I store tweets and it's relations.

I have the following tables:

  • tweets (which has tweets properties, such as its id)
  • users (which has users properties, such as its id)
  • hashtags (which store the mentioned hashtags. It has an id and hashtag column, which stores the hashtag string)
  • searches (this stores the search I have interest in, for example, if I want to search for the tweets talking about "cars", the search table has a row with a search_id and the search_word, which would be "cars").

I connect those using two one-to-many relational tables (searches_tweets and hashtags_tweets), connecting the id's of each table. These are the connections between those:

  • searches - searches_tweets - tweets (searches_tweets has 2 columns: search_id and tweet_id)
  • hashtags - hashtags_tweets - tweets (hashtags_tweets has 2 columns: hashtag_id and tweet_id)

First, I wanted to do the following query: I wanted to retrieve the amount of times a certain hashtag was mentioned in the tweets of a certain search. I achieved that by doing this:

SELECT TOP(100) hashtags.id, hashtags.hashtag, count(tweets.id) AS total_count FROM hashtags
JOIN hashtags_tweets ON hashtags_tweets.hashtag_id = hashtags.id
JOIN tweets ON tweets.id = hashtags_tweets.tweet_id
JOIN searches_tweets ON searches_tweets.tweet_id = tweets.id
WHERE searches_tweets.search_id = 1234
GROUP BY hashtags.id, hashtags.hashtag
ORDER BY total_count DESC

Now, I want to add a column which shows the amount of users that wrote this hashtag in their tweets (from the same search_id). I don't find a way to do this. I tried the following query:

SELECT TOP(100) hashtags.id, hashtags.hashtag, count(tweets.id) AS total_count, count(users.id) AS users_count 
FROM hashtags
JOIN hashtags_tweets ON hashtags_tweets.hashtag_id = hashtags.id
JOIN tweets ON tweets.id = hashtags_tweets.tweet_id
JOIN searches_tweets ON searches_tweets.tweet_id = tweets.id
JOIN users ON tweets.user_id = users.id
WHERE searches_tweets.search_id = 1234
GROUP BY hashtags.id, hashtags.hashtag,  users.id
ORDER BY total_count DESC

This query didn't work since users_count had the same value as total_count. How would you achieve this expected result?

CodePudding user response:

for the distinct number of user you should use

SELECT TOP(100) hashtags.id, hashtags.hashtag, count(tweets.id) AS total_count, 
count(distinct users.id) AS users_count 
FROM hashtags
  ......

GROUP BY hashtags.id, hashtags.hashtag

and obviusly .. you must remove the users.id from the broup by clause..

  • Related