Home > Software design >  How to combine two BigQuery queries where I've used different fields to group by?
How to combine two BigQuery queries where I've used different fields to group by?

Time:07-28

I have a table of transactions from a marketplace. It has three fields: buyer_email, seller_email, date.

I would like to know who are the most active buyers and sellers, assuming that buyers can be sellers and that sellers can be buyers. By "most active" I mean the users that have made the most transactions in the last N days - whether they are buyers or sellers.

I wrote this query to get the most active buyers:

SELECT buyer_email, COUNT(buyer_email) AS number_of_purchases
FROM table 
GROUP BY buyer_email
ORDER BY COUNT(buyer_email) DESC

The results look like this:

|      buyer_email                       |     number_of_purchases    |
| -------------------------------------- | -------------------------- |
|          [email protected]            |         74                 |
|          [email protected]         |         42                 |
|          [email protected]      |         31                 |
|          [email protected]       |         19                 |

And I wrote another query to get the list of most active sellers:

SELECT seller_email, COUNT(seller_email) AS number_of_sales
FROM table 
GROUP BY seller_email
ORDER BY COUNT(seller_email) DESC

The results of which look like this:

|      seller_email                  |     number_of_sales    |
| ---------------------------------- | ---------------------- |
|          [email protected]  |         156            |
|          [email protected]   |         89             |
|          [email protected]     |         23             |
|          [email protected]        |         12             |

I would like to combine both query results to get something like this:

|      user_email          |  number_of_sales    | number_of_purchases | total    |
| ------------------------ | ------------------- | ------------------- | -------- |
| [email protected] |         156         |           31        |    187   |
| [email protected]  |         89          |           19        |    108   | 
| [email protected]       |         12          |           74        |    86    |
| [email protected]    |         23          |           42        |    65    |

However, there are some things to take into account:

  • The cardinality of both sets, buyers and sellers, is not the same.

  • There are buyers that aren't sellers, and sellers that aren't buyers. The number_of_sales for the former would be 0, and the number_of_purchases for the latter would be 0 too. This is tricky, as the GROUP BY clause doesn't group by 0-sized groups.

What I've tried:

  • Using a JOIN statement ON seller_email = buyer_email, but this gives me as a results the rows where the seller and the buyer are the same in a given transaction - people who sell something to themselves.

  • Experimenting with UNION, but failing to get anything relevant.

I'm not sure if that's clear, but if anyone could help me achieve the aforementioned result, that would be great.

CodePudding user response:

Use FULL OUTER JOIN.

WITH
    buyer_data AS (
        SELECT
            buyer_email        AS email,
            COUNT(buyer_email) AS number_of_purchases
        FROM table
        GROUP BY
            buyer_email
        ORDER BY
            COUNT(buyer_email) DESC
    ),
    seller_data AS (
        SELECT
            seller_email        AS email,
            COUNT(seller_email) AS number_of_sales
        FROM table
        GROUP BY
            seller_email
        ORDER BY
            COUNT(seller_email) DESC
    )
SELECT
    COALESCE(buyer_data.email, seller_data.email)                   AS email,
    COALESCE(number_of_purchases, 0)                                AS number_of_purchases,
    COALESCE(number_of_sales, 0)                                    AS number_of_sales,
    COALESCE(number_of_purchases, 0)   COALESCE(number_of_sales, 0) AS total
FROM buyer_data
FULL OUTER JOIN seller_data ON seller_data.email = buyer_data.email

CodePudding user response:

Consider below approach

select *, purchase   sale as total from (
  select buyer_email as user_email, 'purchase' type from your_table
  union all select seller_email, 'sale' from your_table 
)
pivot (count(*) for type in ('purchase', 'sale'))
  • Related