I have a SQL Server Database where I store tweets and it's relations.
I have the following tables:
- tweets (which has tweets properties, such as its id)
- users (which has users properties, such as its id)
- hashtags (which store the mentioned hashtags. It has an id and hashtag column, which stores the hashtag string)
- searches (this stores the search I have interest in, for example, if I want to search for the tweets talking about "cars", the search table has a row with a search_id and the search_word, which would be "cars").
I connect those using two one-to-many relational tables (searches_tweets and hashtags_tweets), connecting the id's of each table. These are the connections between those:
- searches - searches_tweets - tweets (searches_tweets has 2 columns: search_id and tweet_id)
- hashtags - hashtags_tweets - tweets (hashtags_tweets has 2 columns: hashtag_id and tweet_id)
First, I wanted to do the following query: I wanted to retrieve the amount of times a certain hashtag was mentioned in the tweets of a certain search. I achieved that by doing this:
SELECT TOP(100) hashtags.id, hashtags.hashtag, count(tweets.id) AS total_count FROM hashtags
JOIN hashtags_tweets ON hashtags_tweets.hashtag_id = hashtags.id
JOIN tweets ON tweets.id = hashtags_tweets.tweet_id
JOIN searches_tweets ON searches_tweets.tweet_id = tweets.id
WHERE searches_tweets.search_id = 1234
GROUP BY hashtags.id, hashtags.hashtag
ORDER BY total_count DESC
Now, I want to add a column which shows the amount of users that wrote this hashtag in their tweets (from the same search_id). I don't find a way to do this. I tried the following query:
SELECT TOP(100) hashtags.id, hashtags.hashtag, count(tweets.id) AS total_count, count(users.id) AS users_count
FROM hashtags
JOIN hashtags_tweets ON hashtags_tweets.hashtag_id = hashtags.id
JOIN tweets ON tweets.id = hashtags_tweets.tweet_id
JOIN searches_tweets ON searches_tweets.tweet_id = tweets.id
JOIN users ON tweets.user_id = users.id
WHERE searches_tweets.search_id = 1234
GROUP BY hashtags.id, hashtags.hashtag, users.id
ORDER BY total_count DESC
This query didn't work since users_count
had the same value as total_count
. How would you achieve this expected result?
CodePudding user response:
for the distinct number of user you should use
SELECT TOP(100) hashtags.id, hashtags.hashtag, count(tweets.id) AS total_count,
count(distinct users.id) AS users_count
FROM hashtags
......
GROUP BY hashtags.id, hashtags.hashtag
and obviusly .. you must remove the users.id from the broup by clause..