I have a set of clickstream data (ie user id, start and end date/time, active use duration, total duration for each session).
I need to retrieve unique daily data for a given month, sorted by user_id, date, then webOrMobile.
What I have so far, using January 2019 as an example:
SELECT DISTINCT id, CONVERT(DATE, START_DTTM) AS date, WEB_VS_MOBILE_DISPLAY AS webOrMobile, ACTIVE_DURATION_SECONDS
FROM ClickStream
WHERE START_DTTM LIKE '2019-01-%'
ORDER BY id, START_DTTM, WEB_VS_MOBILE_DISPLAY
which results in:
id date webOrMobile ACTIVE_DURATION_SECONDS
Z000013917 2019-01-07 Web 245
Z000013917 2019-01-10 Web 128
Z000013917 2019-01-29 Web 261
Z000013917 2019-01-31 Web 1246
Z0000140154 2019-01-18 Mobile 212
Z0000140154 2019-01-22 Mobile 18
Z0000140154 2019-01-23 Mobile 201
Z0000158605 2019-01-11 Web 67
Z0000158605 2019-01-11 Web 587
Z0000158605 2019-01-11 Web 681
Z0000158605 2019-01-11 Web 1123
Z0000158605 2019-01-13 Web 2896
Z0000158605 2019-01-14 Web 20
My issue is the rows with date 2019-01-11, which are identical aside from the duration. How can I combine these rows into one row that has the sum of the individual durations? So something like:
id date webOrMobile ACTIVE_DURATION_SECONDS
Z000013917 2019-01-07 Web 245
Z000013917 2019-01-10 Web 128
Z000013917 2019-01-29 Web 261
Z000013917 2019-01-31 Web 1246
Z0000140154 2019-01-18 Mobile 212
Z0000140154 2019-01-22 Mobile 18
Z0000140154 2019-01-23 Mobile 201
Z0000158605 2019-01-11 Web 2458
Z0000158605 2019-01-13 Web 2896
Z0000158605 2019-01-14 Web 20
where every row now has a unique id/date/webOrMobile.
Edit: I also tried
SELECT pat_did, CONVERT(DATE, START_DTTM), WEB_VS_MOBILE_DISPLAY, SUM(MYC_ACTIVE_DURATION_SECONDS) FROM ClickStream
WHERE START_DTTM LIKE '2019-01-%'
GROUP BY pat_did, START_DTTM, WEB_VS_MOBILE_DISPLAY
which results in:
id date webOrMobile ACTIVE_DURATION_SECONDS
Z000013917 2019-01-07 Web 4900
Z000013917 2019-01-10 Web 1536
Z000013917 2019-01-29 Web 3915
Z000013917 2019-01-31 Web 11214
Z0000140154 2019-01-18 Mobile 3392
Z0000140154 2019-01-22 Mobile 54
Z0000140154 2019-01-23 Mobile 1407
Z0000158605 2019-01-11 Web 24706
Z0000158605 2019-01-11 Web 5283
Z0000158605 2019-01-11 Web 12258
Z0000158605 2019-01-11 Web 469
Z0000158605 2019-01-13 Web 112944
Z0000158605 2019-01-14 Web 5746
The rows did not combine but the duration values are now too high, for example 112944 which equals 31 hours which is impossible.
CodePudding user response:
Any column not appearing in your GROUP BY must only appear in an aggregate function in the SELECT. And you don't use both GROUP BY and DISTINCT. In fact you rarely if ever should use DISTINCT.
So something like
SELECT id,
CONVERT(DATE, START_DTTM) AS date,
WEB_VS_MOBILE_DISPLAY AS webOrMobile,
SUM(ACTIVE_DURATION_SECONDS) ACTIVE_DURATION_SECONDS
FROM ClickStream
WHERE START_DTTM LIKE '2019-01-%'
GROUP BY ID, START_DTTM, WEB_VS_MOBILE_DISPLAY
ORDER BY id, START_DTTM, WEB_VS_MOBILE_DISPLAY
CodePudding user response:
I get the impression that you're having issues with duplicate rows and date conversions:
with data as (
SELECT DISTINCT
ID, START_DTTM, cast(START_DTTM as date) as "date",
WEB_VS_MOBILE_DISPLAY as webOrMobile,
ACTIVE_DURATION_SECONDS as duration
FROM ClickStream
WHERE START_DTTM LIKE '2019-01-%'
)
select ID as id, "date", webOrMobile, sum(duration) as duration
from data
group by id, "date", webOrMobile
order by id, "date", webOrMobile;
If there were a need to do it without CTE's/derived tables etc. then you could use a bit of a hack along these lines:
select ID, cast(START_DTTM as date), WEB_VS_MOBILE_DISPLAY as webOrMobile,
sum(distinct
datediff(second, cast(START_DTTM as date), START_DTTM) *
cast(100000000 as bigint) ACTIVE_DURATION_SECONDS
) % 100000000 as duration
from ClickStream
group by ID, cast(START_DTTM as date), WEB_VS_MOBILE_DISPLAY;
You essentially have to pack the date info as a numeric value (that will sum) into high order digits with the duration so that each combo can remain unique, then unpacking by stripping off the lower order digits afterward.
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=23b55e85d42c78e16bab406135fa3291