So, let's say I have this data:
store_name | latitude | longitude | store_id | same_location_count | same_location_store_id
SR Restaurant and Cafe | -41.575449 | 147.16824 | 1112 | 0 | null
Big Bite Burgers | -41.575449 | 147.16824 | 1113 | 0 | null
Amigos | -41.575449 | 147.16824 | 1114 | 0 | null
Domino's | -38.33983 | 143.58384 | 1115 | 0 | null
I want to identity the stores that are in same coordinates.
The expected result looks like this:
store_name | latitude | longitude | store_id | same_location_count | same_location_store_id
SR Restaurant and Cafe | -41.575449 | 147.16824 | 1112 | 2 | 1113:1114
Big Bite Burgers | -41.575449 | 147.16824 | 1113 | 2 | 1112:1114
Amigos | -41.575449 | 147.16824 | 1114 | 2 | 1112:1113
Domino's | -38.33983 | 143.58384 | 1115 | 0 | null
If the stores are in same coordinates, we increment the same_location_count
and get the ID of the same store location and concatenate it to same_location_store_id
.
CodePudding user response:
You can use count and string_agg function for achieving this.
Test code:
with test as (
select 'id1' id, '-41.575449' lat, '1112' long
union all
select 'id2' id, '-41.575449' lat, '1112' long
union all
select 'id3' id, '-41.575449' lat, '1112' long
union all
select 'id4' id, '-38.33983' lat, '1115' long
)
SELECT
*,
(COUNT(id) OVER (PARTITION BY lat, long))-1 AS same_location_count,
REPLACE( STRING_AGG(CONCAT(id,':'),'') OVER (PARTITION BY lat, long), CONCAT(id,':'), '') AS same_location_store_id
FROM
test
Edit: match the exact scenario without counting the id of each row and excluding it from the concatenation.
CodePudding user response:
Maybe something similar to the following. This works for Postgres DB, I think it should work in BigQuery also?
SELECT
stores.*,
COUNT(store_id) OVER (PARTITION BY latitude,longitude) - 1 AS same_location_count,
NULLIF(
TRIM(REPLACE(
CONCAT(':', STRING_AGG(CAST(store_id AS TEXT), ':') OVER (PARTITION BY latitude,longitude), ':'),
CONCAT(':', CAST(store_id AS TEXT), ':'),
':'
), ':'),
''
) AS same_location_store_id
FROM stores
Fiddle link: https://www.db-fiddle.com/f/gUfJm7X75Fdx1nuy9hSrsD/1
Edit: Updated from comment.
Count is the easy part; just subtract 1 from value. Same location stores you can see has become a horrible mess...
Basically the logic is:
- Collect all store_ids for same location (now using string_agg instead of array_agg
- Make a string of
:idA:idB:...:idN:
with leading,trailing : - Remove the store_id of same store by replace ":id:" with just ":"
- Tidy up afterwards to make sure null for empty string, and remove leading,trailing : again
Step 3 works only because store_id is integer, and so guaranteed not to contain separator char :