My dataset, bike_trip_2019
, contains bike trip data from 2 different user_type
: member and casual.
trip_id | started_at | ended_at | user_type |
---|---|---|---|
1 | 2019-01-01 07.30 UTC | 2019-01-01 07.40 UTC | member |
2 | 2019-01-01 09:20 UTC | 2019-01-01 09:46 UTC | member |
3 | 2019-01-02 14:20 UTC | 2019-01-02 14:45 UTC | casual |
4 | 2019-01-02 16:20 UTC | 2019-01-02 17:05 UTC | casual |
...... | ...... | ...... | ...... |
9999 | 2019-12-31 06:20 UTC | 2019-12-31 06:50 UTC | casual |
I want to find out the average number of trips taken by each user type for each day_of_week
.
The expected result table looks kinda like this :
day_of_week | avg_num_member_trips | avg_num_casual_trips |
---|---|---|
Sun | ||
Mon | ||
TUe | ||
Wed | ||
Thu | ||
Fri | ||
Sat |
What query should I use to achieve the same / similar result?
CodePudding user response:
I have tried to mock some data and write the query. You can use DATETIME functions to fetch the day based on ended_at
column.
WITH dt as (SELECT ARRAY[
STRUCT("Sun" as day , 'member' as memtype),
STRUCT("Sun" as day , 'member' as memtype),
STRUCT("Sun" as day , 'casual' as memtype),
STRUCT("Mon" as day, "casual" as memtype),
STRUCT("Mon" as day, "member" as memtype)
] as cols)
SELECT
col.day,
COUNTIF(col.memtype = 'member') as member_trip,
COUNTIF(col.memtype='casual') as casual_trip,
AVG(CASE WHEN col.memtype = 'member' THEN 1 else 0 END) as avg_mem_trip,
AVG(CASE WHEN col.memtype = 'casual' THEN 1 else 0 END) as avg_casual_trip
FROM dt, UNNEST(dt.cols) as col
group by day
CodePudding user response:
If each trip starts and ends on the same day, you can use this
SELECT DATENAME(WEEKDAY,DATEADD(dd, 0, DATEDIFF(dd, 0, started_at))) AS Trip_Date, AVG(CASE WHEN user_type="member") AS Member_Trips, AVG(CASE WHEN user_type="casual") AS Casual_Trips FROM bike_trip_2019 Group by DATENAME(WEEKDAY,DATEADD(dd, 0, DATEDIFF(dd, 0, started_at))) order by DATENAME(WEEKDAY,DATEADD(dd, 0, DATEDIFF(dd, 0, started_at)))
CodePudding user response:
Consider below approach
select day_of_week,
cast(avg_num_trips_member as int64) avg_num_trips_member,
cast(avg_num_trips_casual as int64) avg_num_trips_casual
from (
select user_type, count(*) trips,
format_date('%a', date(started_at)) day_of_week,
extract(dayofweek from date(started_at)) pos
from your_table
group by user_type, day_of_week, date(started_at), pos
)
pivot (avg(trips) avg_num_trips for user_type in ('member', 'casual'))
order by pos
with output like