MySQL distinct users table but repeating every year, to make a kind of histogram-CodePudding

I'm blocked on this which for sure is a simple thing to do. I have a table with a list of users with date logins, example:

---------------------
date       | user
---------------------
2013-07-08 | Carl
2013-07-09 | Mark
2013-07-09 | Robert
2013-08-09 | Robert
2013-08-09 | Saul
2013-09-09 | Paula
2014-01-09 | Mark
2014-02-09 | Robert

and so. So I do a

SELECT DISTINCT(user) FROM logins;

to get a full list of users. Now what I want is to get a result with the total count of logins (each year) of each user, and if the user didn't log in during the past year, still add it to the results with a 0 count. The important thing is to respect user list.

If I do this:

SELECT user, year(date) as year, COUNT(*) as count
FROM logins
GROUP BY user, year 
ORDER BY year ASC, count DESC;

I will get the right count for users logged in 2013 and 2014 and so but the ones didn't log-in will be excluded, and I need to include them with a 0 count. No matter if the solution is MySQL, PHP or even JS. just need the array to latter on encode as JSON. So far I tried with MySQL with no luck, and now I'm trying with PHP getting the list of users in one array and the above results in another array but also no luck on a fast way to do it or I'm lacking of ideas because my psychiatric treatment pills (yes, my head is oversaturated). That's why I need help.

THANKS FOR SHARING ALL YOUR KNOWLEDGE AS ALWAYS!

EDIT As asked by @Rick, here is the table in question and the real fields I'm using are "isp" and "fecha_login"

CREATE TABLE `login_history` (                                                           
                 `id` int(11) NOT NULL AUTO_INCREMENT,                                                  
                 `user_id` int(11) DEFAULT NULL,                                                        
                 `personal_id` int(11) DEFAULT NULL,                                                    
                 `responsable_id` int(11) DEFAULT NULL,                                                 
                 `nombre` varchar(50) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,          
                 `fecha_login` datetime DEFAULT NULL,                                                   
                 `ip` int(11) unsigned DEFAULT NULL,                                                    
                 `isp` varchar(100) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,            
                 `ciudad` varchar(50) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,          
                 `provincia` varchar(50) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,       
                 `pais` varchar(50) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,            
                 `dispositivo` varchar(100) CHARACTER SET utf8 COLLATE utf8_spanish_ci DEFAULT NULL,    
                 `datosmoviles` tinyint(1) NOT NULL,                                                    
                 PRIMARY KEY (`id`)                                                                     
               ) ENGINE=InnoDB AUTO_INCREMENT=20991 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

and the explain select:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY		ALL	NULL	NULL	NULL	NULL	19690	Using temporary; Using filesort
1	PRIMARY		ALL	NULL	NULL	NULL	NULL	19690	Using join buffer (flat, BNL join)
1	PRIMARY	l	ALL	NULL	NULL	NULL	NULL	19690	Using where; Using join buffer (incremental, BNL join)
3	DERIVED	login_history	ALL	NULL	NULL	NULL	NULL	19690	Using temporary
2	DERIVED	login_history	ALL	NULL	NULL	NULL	NULL	19690	Using temporary

CodePudding user response：

You need a CROSS join of all the distinct years and all the distinct users and a LEFT join to the table so that you can have all the possible combinations before you aggregate:

SELECT y.year, u.user, COUNT(l.user) count
FROM (SELECT DISTINCT YEAR(date) year FROM logins) y
CROSS JOIN (SELECT DISTINCT user FROM logins) u
LEFT JOIN logins l ON YEAR(l.date) = y.year AND l.user = u.user
GROUP BY y.year, u.user
ORDER BY y.year, count DESC;

See the demo.

CodePudding user response：

You mentioned performance as a problem. This answer addresses that.

A quick fix is to build and maintain a separate table with just the users and another with just the years. Since @forpas's query requires at least 3 full passes over the table, this quick fix may more than double the speed.

To get more speed (maybe 10x more), build and maintain a Summary Table with columns for day, user, and count for that day for that user. Then run the big query against the summary table.

If "day" does not provide enough shrinkage, pick month instead.

If you need a partial count for the current year-to-date, use the summary table for time before today (or this month), then UNION with another query to get the partial results from there forward.

If you get that far, but are not happy with the speed, please provide SHOW CREATE TABLE and EXPLAIN SELECT ... for help with indexing.