Mysql how to find count of distinct columns based on another distinct column?-CodePudding

I have a table like this

userId   story  novel
1        a      b
1        a      b
1        a      c
1        b      c
1        b      c
2        x      x
2        x      y
2        y      y
3        m      n
4        NULL   NULL

How do I find the most story and novel count per user?

What I am looking for is the highest distinct count of story and novel for each user. So if a user has no story then story_count should be 0.

Desired output looks like this

userId   story  story_count  novel novel_count
1        a      3            c     3
2        x      2            y     2
3        m      1            n     1
4        NULL   0            NULL  0

This is my faulty current attempt

SELECT userId, story, COUNT(story) as story_count, novel, COUNT(novel) as novel_count  
FROM logs WHERE user = (SELECT DISTINCT(user)) GROUP BY story, novel;

CodePudding user response：

Based on Tim's answer,I provide an upgrade solution

create table `logs`(
 userId int,
 story varchar(10),
 novel varchar(10)
);

insert into `logs`(userId,story,novel) values
(1,'a',' b'),
(1 ,'a','c'),
(1 ,'b','c'),
(2 ,'x','x'),
(2 ,'x','y'),
(2 ,'y','y');

SELECT t1.userId,t1.story,t1.story_count,t2.novel,t2.novel_count
FROM
(
SELECT userId, story, COUNT(*) AS story_count,
       RANK() OVER (PARTITION BY userId ORDER BY COUNT(*) DESC) rn
FROM logs
GROUP BY userId, story
) as t1
join
(
SELECT userId, novel, COUNT(*) AS novel_count,
       RANK() OVER (PARTITION BY userId ORDER BY COUNT(*) DESC) rn
FROM logs
GROUP BY userId, novel
)as t2
ON t1.userId = t2.userId and t1.rn = t2.rn
WHERE t1.rn =1

DB Fiddle Demo

CodePudding user response：

You should really use a window function if this is possible because thus, you can keep your query much shorter and simpler. If this is not possible and you really need to do it without them, you can also create two subqueries for both the story data and the novel data according to your conditions and then join them. Something like this:

SELECT storydata.userId, storydata.story, storydata.counter AS story_count,
noveldata.novel, noveldata.counter AS novel_count  
FROM
(SELECT DISTINCT l.userId, l.story, sub.counter
FROM logs l
JOIN
(SELECT userId, COALESCE(MAX(story_count),0) AS counter
FROM
(SELECT userId, story,
COUNT(story) as story_count  
FROM logs 
GROUP BY userId, story) sub
GROUP BY userId) sub
ON l.userId = sub.userId
GROUP BY l.userId, l.story
HAVING COUNT(l.userId) = sub.counter OR l.story IS NULL) AS storydata
JOIN
(SELECT noveldata.userId, noveldata.novel, noveldata.counter AS counter
FROM
(SELECT DISTINCT l.userId, l.novel, sub.counter
FROM logs l
JOIN
(SELECT userId, COALESCE(MAX(novel_count),0) AS counter
FROM
(SELECT userId, novel,
COUNT(novel) as novel_count  
FROM logs 
GROUP BY userId, novel) sub
GROUP BY userId) sub
ON l.userId = sub.userId
GROUP BY l.userId, l.novel
HAVING COUNT(l.userId) = sub.counter OR l.novel IS NULL) AS noveldata) AS noveldata
ON storydata.userId = noveldata.userId;

But as you can see, this will become very complicated although it will produce the correct outcome. See here: db<>fiddle

Therefore, I once again highly recommend to use a DB version that provides window functions.

CodePudding user response：

I hope this works:


select story.userid, story, story_count, novel, novel_count 
from
    (
        select userid, story, count(*) story_count, 
        row_number() over (partition by userid order by count(*) desc) rown_num
        from logs
        where row_num=1
    )story
left join 
    (
        select userid, novel, count(*) novel_count, 
        row_number() over (partition by userid order by count(*) desc) rown_num
        from logs
        where row_num=1
    )novel on novel.userid=story.userid