Home > OS >  Identify stores with same location in Bigquery
Identify stores with same location in Bigquery

Time:11-18

So, let's say I have this data:

store_name               | latitude    | longitude | store_id | same_location_count | same_location_store_id
SR Restaurant and Cafe   | -41.575449  | 147.16824 | 1112     | 0           | null
Big Bite Burgers         | -41.575449  | 147.16824 | 1113     | 0           | null
Amigos                   | -41.575449  | 147.16824 | 1114     | 0           | null
Domino's                 | -38.33983   | 143.58384 | 1115     | 0           | null

I want to identity the stores that are in same coordinates.

The expected result looks like this:

store_name               | latitude    | longitude | store_id | same_location_count | same_location_store_id
SR Restaurant and Cafe   | -41.575449  | 147.16824 | 1112     | 2           | 1113:1114
Big Bite Burgers         | -41.575449  | 147.16824 | 1113     | 2           | 1112:1114
Amigos                   | -41.575449  | 147.16824 | 1114     | 2           | 1112:1113
Domino's                 | -38.33983   | 143.58384 | 1115     | 0           | null

If the stores are in same coordinates, we increment the same_location_count and get the ID of the same store location and concatenate it to same_location_store_id.

CodePudding user response:

You can use count and string_agg function for achieving this.

Test code:

with test as (
  select 'id1' id, '-41.575449' lat, '1112' long
  union all
  select 'id2' id, '-41.575449' lat, '1112' long
  union all
  select 'id3' id, '-41.575449' lat, '1112' long
  union all
  select 'id4' id, '-38.33983' lat, '1115' long
)
SELECT
  *,
  (COUNT(id) OVER (PARTITION BY lat, long))-1 AS same_location_count,
  REPLACE( STRING_AGG(CONCAT(id,':'),'') OVER (PARTITION BY lat, long), CONCAT(id,':'), '') AS same_location_store_id
FROM
  test

Edit: match the exact scenario without counting the id of each row and excluding it from the concatenation.

CodePudding user response:

Maybe something similar to the following. This works for Postgres DB, I think it should work in BigQuery also?

SELECT
  stores.*,
  COUNT(store_id) OVER (PARTITION BY latitude,longitude) - 1 AS same_location_count,
  NULLIF(
    TRIM(REPLACE(
      CONCAT(':', STRING_AGG(CAST(store_id AS TEXT), ':') OVER (PARTITION BY latitude,longitude), ':'),
      CONCAT(':', CAST(store_id AS TEXT), ':'),
      ':'
    ), ':'),
    ''
  ) AS same_location_store_id
FROM stores

Fiddle link: https://www.db-fiddle.com/f/gUfJm7X75Fdx1nuy9hSrsD/1

Edit: Updated from comment.

Count is the easy part; just subtract 1 from value. Same location stores you can see has become a horrible mess...

Basically the logic is:

  1. Collect all store_ids for same location (now using string_agg instead of array_agg
  2. Make a string of :idA:idB:...:idN: with leading,trailing :
  3. Remove the store_id of same store by replace ":id:" with just ":"
  4. Tidy up afterwards to make sure null for empty string, and remove leading,trailing : again

Step 3 works only because store_id is integer, and so guaranteed not to contain separator char :

  • Related