Home > Enterprise >  Why rows that are in table 2 but have no match in table 1 are dropped after RIGHT JOIN?
Why rows that are in table 2 but have no match in table 1 are dropped after RIGHT JOIN?

Time:12-29

I need to join two tables: 'our_sample' and 'tls207_pers_appln' from PATSTAT.

'our_sample' has 4 columns: appln_id, appln_auth, appln_nr, appln_kind 'tls207_pers_appln' has 4 columns: appln_id, person_id, applt_seq_nr, invt_seq_nr.

'our_sample' has 2191 rows and some (60) of them are missing in 'tls207_pers_appln'.

Because I want to join the tables keeping all the appln_id in 'our_sample' (even if they do not have the matching information from 'tls207_pers_appln') I join the two tables doing a RIGHT JOIN.

However, the resulting view 't2_tot_in_patent' has only 2096 appln_id.

This is in part due to the restriction I put (35 patents are dropped because I select only those HAVING MAX(invt_seq_nr) > 0, which is fine). But this would yield 2191-35 = 2156 patents.

Instead, I get 2096 of them that is: 2191 (in our_sample) - 60 (appln_ids in our_sample that miss from tls207) - 35 (appln_ids for which invt_seq_nr = 0)

BUT the whole point of using RIGHT JOIN is that I should not loose those 60 patents. Why then?

-- compiling total count of inventors per patent: t2_tot_in_patent

DROP VIEW IF EXISTS t2_tot_in_patent;
CREATE VIEW t2_tot_in_patent AS
SELECT
m.appln_id, MAX(invt_seq_nr) AS tot_in_patent
FROM
patstat2022a.tls207_pers_appln AS t7
RIGHT OUTER JOIN cecilia.our_sample AS m ON t7.appln_id = m.appln_id 
GROUP BY appln_id HAVING MAX(invt_seq_nr) > 0

CodePudding user response:

Try this: GROUP BY m.appln_id. Maybe the sql is grouping by appln_id of other table. Let me know if this helps. And also please specify the SQL you are using i.e. MySQL or PostgreSQL or any other, if this doesn't work.

CodePudding user response:

A right-join to our_sample in the second position is the same as a left-join starting with our_sample in the first position. No alias for the max( invt_seq_nr ) table which I would imply comes from the t7 table.

Now, it appears you want to have EVERYONE from the our_sample table regardless of the tls207_pers_appln. I would reverse to a left-join putting our_sample in primary position.

SELECT
        m.appln_id, 
        count(t7.invt_seq_nr ) t7Entries,
        MAX( t7.invt_seq_nr) AS tot_in_patent
    FROM
        cecilia.our_sample AS m
            LEFT JOIN patstat2022a.tls207_pers_appln AS t7
                ON m.appln_id = t7.appln_id
    GROUP BY 
        m.appln_id 
    order by
        case when count(t7.invt_seq_nr ) = 0
            then 1 else 2 end

Remove the HAVING clause, that is going to chop off those people who never had an event. By adding the order by as I have here, I am basically floating to the top of the list all people who did NOT have any records in the t7 table.

At this point you should be able to get your entire EXPECTED count to be correct, but also see those that dont have and see if that pairs-up with the missing count encountered.

Once confirmed, you can re-apply the HAVING clause to only restrict those who HAD activity in the t7 table vs not.

  • Related