Home > front end >  Is there a way to select only the most complete rows that contain text, based on a unique ID column
Is there a way to select only the most complete rows that contain text, based on a unique ID column

Time:12-04

I have table called location that looks like this:


|  zip_code  | city          | state |
|  --------  | ------------- | ----- | 
| 01108-2005 |               |       |
| 02130-1911 | Jamaica Plain | MA    |
| 02130-1911 | Jamaica Plain |       |
| 02138-1557 | Cambridge     | MA    |
| 02138-1557 |               | MA    |
| 02138-1557 |               |       |
| 02370-2509 |               | MA    |
| 02370-2509 |               | MA    |

I only want unique zip codes and as many corresponding city/state values as possible. For example:

  • the zip code 01108-2005 is already unique, so I will keep it even though there is no city or state.
  • the zip code 02138-1557 at least has state in one record, but not in the other, so I want to only keep the record that has a state.
  • the zip code 02130-1911 has a record with only city as well as a record with both city and state. I want the record with both city and state, for it is the most complete.

I've tried the following code:

SELECT DISTINCT zip_code, city, state
           FROM location
           ORDER BY zip_code, city DESC, state DESC

which treats each record as distinct. Is there a way to only select the most complete records, or to merge all records that share a zip code?

I am new to StackOverflow (and to MySQL) so any tips on how I can make my question easier to understand are appreciated.

CodePudding user response:

Using this concept: MySQL Orderby a number, Nulls last

Without test data: Simple query for location

SELECT zip_Code, City, State 
FROM (SELECT zip_Code, city, state, row_number() over (partition by zip_Code order by isNull(city), city ASC, isnull(state), state ASC) RN
     FROM LOCATION) subq
WHERE RN=1

With Test Data: TESTED:

 SELECT zip_Code, City, State 
 FROM (SELECT zip_Code, city, state, row_number() over (partition by zip_Code order by isNull(city), city ASC, isnull(state), state ASC) RN
       FROM (SELECT '02130-1911' zip_Code, 'Jamaica Plain' city, 'MA' state  UNION ALL
             SELECT '02130-1911','Jamaica Plains', 'MA' UNION ALL
             SELECT '02130-1911', 'aAKLAND'      , 'MA' UNION ALL 
             SELECT '01108-2005', NULL           , null UNION ALL 
             SELECT '02130-1911', 'Jamaica Plain', NULL UNION ALL
             SELECT '02138-1557', 'Cambridge'    , 'MA' UNION ALL
             SELECT '02138-1557', NULL           , 'MA' UNION ALL
             SELECT '02138-1557', NULL           , NULL UNION ALL  
             SELECT '02370-2509', NULL           , 'MA' UNION ALL 
             SELECT '02370-2509', NULL           , 'MA' ) LOCATION) subQ
 WHERE RN=1

Example: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a03800d0e61713ee61b4c550d395c096

Giving us below and handling edge case I believe This is giving us the city/state alphabetically first when multiple exist. We could check to see if a city/state is populated assign a 1 add both together. Use result to assign a rank/row and return all such occurrences if multiple are "as complete". Like I said Edge cases and business rules... :P

 ------------ ----------- ------- 
|  zip_Code  |   City    | State |
 ------------ ----------- ------- 
| 01108-2005 |           |       |
| 02130-1911 | aAKLAND   | MA    | <-- My edge case
| 02138-1557 | Cambridge | MA    |
| 02370-2509 |           | MA    |
 ------------ ----------- ------- 

CodePudding user response:

Check for not null in state

SELECT DISTINCT zip_code, city, state
FROM location
WHERE state is not null
 ORDER BY zip_code, city DESC, state DESC
  • Related