Home > database >  How to join two tables where the only common field seems to be the date field?
How to join two tables where the only common field seems to be the date field?

Time:08-19

I'm having trouble with these two tables:

ad_spend

| date    | campaign   | spend |
| ------- | ------    | ----- |
| 1/1/2021|   C       |  500  | 
| 1/1/2021|   D       |  500  |
| 1/1/2021|   C       |  500  | 
| 1/1/2021|   D       |  500  |
| 1/2/2021|   C       |  500  | 
| 1/2/2021|   D       |  500  |
| 1/2/2021|   C       |  500  | 
| 1/3/2021|   D       |  500  |
...

signups_from_ad

| date    | campaign   | customer_id |
| ------- | ------    | ----------- |
| 1/1/2021|   C       |      1      | 
| 1/1/2021|   C       |      2      |
| 1/1/2021|   D       |      3      | 
| 1/1/2021|   C       |      4      |
| 1/2/2021|   C       |      5      | 
| 1/2/2021|   D       |      6      |
| 1/2/2021|   D       |      7      | 
| 1/3/2021|   D       |      8      |
...

I need to create one table from these two that would return the spend, the number of sign-ups, and the cost per sign-up by date and campaign and I'm at a loss of how to proceed. I tried to do joins, but it doesn't seem like they are the right solution. I know that it's possible to combine two tables that don't have a common column using union/union all & cross joins, but I feel like that won't be helpful here either. Thank you in advance for any suggestions.

EDIT: As suggested in the comments I re-tried this inner join:

Select distinct s.date, s.campaign, count(s.customer_id), sum(a.spend), sum(a.spend)/count(s.customer_id) as 'cost' 
from signups_from_dig as s 
inner join ad_spend as a 
     on a.date=s.date 
group by s.date, s.campaign;

The code appears to double count entries, but I used "distinct" in front of count seemed to fix it partially. However, using distinct for sum doesn't seem to work (it makes it only count once) and I'm not sure how to fix that.

CodePudding user response:

Aggregate the tables by campaign and date and then join the results. In standard SQL that would be:

select date, campaign, sp.total_spent, si.total_signups
from
(
  select date, campaign, total_spent
  from ad_spend
  group by date, campaign
) sp
full outer join
(
  select date, campaign, count(*) as total_signups
  from signups_from_ad
  group by date, campaign
) si using (date, campaign)
order by date, campaign;

MySQL, however does not support full outer joins. If it is guaranteed for every date-campaign pair that has a signup to also have a ad_spend row, then just turn the full join into a left outer join. If this is not the case then get the distinct date-campaign pairs first and then left outer join the aggregated tables:

select date, campaign, sp.total_spent, si.total_signups
from
(
  select date, campaign from ad_spend
  union 
  select date, campaign from signups_from_ad
) pairs
left outer join
(
  select date, campaign, total_spent
  from ad_spend
  group by date, campaign
) sp using (date, campaign)
full outer join
(
  select date, campaign, count(*) as total_signups
  from signups_from_ad
  group by date, campaign
) si using (date, campaign)
order by date, campaign;

CodePudding user response:

If I understand your needs correctly, you need a query like this

SELECT
    *,
    spend/signups AS cost
FROM (
    SELECT
        s.date,
        s.campaign,
        COUNT(s.customer_id) AS signups,
        (SELECT SUM(ad.spend) FROM ad_spend ad WHERE ad.date = s.date AND ad.campaign = s.campaign) AS spend
    FROM signups_from_dig s
    GROUP BY s.date, s.campaign
) t;

It produces

date campaign signups spend cost
1/1/2021 C 3 1000 333.3333
1/1/2021 D 1 1000 1000.0000
1/2/2021 C 1 1000 1000.0000
1/2/2021 D 2 500 250.0000
1/3/2021 D 1 500 500.0000

You can check a working demo here

  • Related