SQL How to find the AVG number of trips taken by 2 different groups for each day in a year-CodePudding

My dataset, bike_trip_2019, contains bike trip data from 2 different user_type: member and casual.

trip_id	started_at	ended_at	user_type
1	2019-01-01 07.30 UTC	2019-01-01 07.40 UTC	member
2	2019-01-01 09:20 UTC	2019-01-01 09:46 UTC	member
3	2019-01-02 14:20 UTC	2019-01-02 14:45 UTC	casual
4	2019-01-02 16:20 UTC	2019-01-02 17:05 UTC	casual
......	......	......	......
9999	2019-12-31 06:20 UTC	2019-12-31 06:50 UTC	casual

I want to find out the average number of trips taken by each user type for each day_of_week. The expected result table looks kinda like this :

day_of_week	avg_num_member_trips	avg_num_casual_trips
Sun
Mon
TUe
Wed
Thu
Fri
Sat

What query should I use to achieve the same / similar result?

CodePudding user response：

I have tried to mock some data and write the query. You can use DATETIME functions to fetch the day based on ended_at column.

WITH dt as (SELECT ARRAY[
    STRUCT("Sun" as day , 'member' as memtype), 
    STRUCT("Sun" as day , 'member' as memtype), 
    STRUCT("Sun" as day , 'casual' as memtype), 
    STRUCT("Mon" as day, "casual" as memtype), 
    STRUCT("Mon" as day, "member" as memtype)
  ] as cols)
SELECT
col.day,
COUNTIF(col.memtype = 'member') as member_trip,
COUNTIF(col.memtype='casual') as casual_trip,
AVG(CASE WHEN col.memtype = 'member' THEN 1 else 0 END) as avg_mem_trip,
AVG(CASE WHEN col.memtype = 'casual' THEN 1 else 0 END) as avg_casual_trip
FROM dt, UNNEST(dt.cols) as col
group by day

CodePudding user response：

If each trip starts and ends on the same day, you can use this

SELECT DATENAME(WEEKDAY,DATEADD(dd, 0, DATEDIFF(dd, 0, started_at))) AS Trip_Date,  AVG(CASE WHEN user_type="member") AS Member_Trips,  AVG(CASE WHEN user_type="casual") AS Casual_Trips FROM bike_trip_2019 Group by DATENAME(WEEKDAY,DATEADD(dd, 0, DATEDIFF(dd, 0, started_at))) order by DATENAME(WEEKDAY,DATEADD(dd, 0, DATEDIFF(dd, 0, started_at)))

CodePudding user response：

Consider below approach

select day_of_week, 
  cast(avg_num_trips_member as int64) avg_num_trips_member,
  cast(avg_num_trips_casual as int64) avg_num_trips_casual
from (
  select user_type, count(*) trips,
    format_date('%a', date(started_at)) day_of_week,
    extract(dayofweek from date(started_at)) pos
  from your_table
  group by user_type, day_of_week, date(started_at), pos
)
pivot (avg(trips) avg_num_trips for user_type in ('member', 'casual'))
order by pos

with output like