Here is the full table: (code at end)
query = """
SELECT * FROM businesses
"""
df = pd.read_sql_query(query, conn)
df
I would like a single SQL query which finds the top 2 popular cities where owners have at least 2 shops. With expected output being London and Leicester:
I currently know how to group by city to find top 2 cities:
# Top 2 popular cities
query1 = """
SELECT city, COUNT(*) as frequency
FROM businesses
GROUP BY city
ORDER BY frequency DESC
LIMIT 2
"""
df = pd.read_sql_query(query1, conn)
df
And how to filter to find owners who have at least 2 shops:
# Owners who have at least 2 shops
query2 = """
SELECT owner, COUNT(*) AS count
FROM businesses
GROUP BY owner
HAVING count >= 2
ORDER BY count DESC
"""
df = pd.read_sql_query(query2, conn)
df
But I don't know how to combine query1 and query 2 into one query.
I would really appreciate any help with this, trying to self-learn SQL :)
Here is the code which creates the database if you'd like to follow along:
# import libraries
import pandas as pd
import sqlite3
# create database
conn = sqlite3.connect("my_db.db")
# create table
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS businesses;")
query = """
CREATE TABLE IF NOT EXISTS businesses (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
city TEXT NOT NULL,
owner TEXT NOT NULL
)
"""
cur.execute(query)
conn.commit()
# add rows to table
query = """
INSERT INTO businesses
(id, name, city, owner)
VALUES
(1, "Shop A", "London", "Tom"),
(2, "Shop B", "London", "Tom"),
(3, "Shop C", "London", "Tom"),
(4, "Shop D", "Luton", "Alice"),
(5, "Shop E", "Leeds", "Jenny"),
(6, "Shop F", "Leicester", "James"),
(7, "Shop G", "Leicester", "James"),
(8, "Shop H", "Leicester", "Emma"),
(9, "Shop I", "Leicester", "Emma"),
(10, "Shop J", "Liverpool", "James"),
(11, "Shop K", "Liverpool", "James"),
(12, "Shop L", "Liverpool", "George"),
(13, "Shop M", "Shefield", "Mary"),
(14, "Shop N", "Shefield", "Mary"),
(15, "Shop O", "Cambridge", "Oliver"),
(16, "Shop P", "Manchester", "Harry")
"""
cur.execute(query)
conn.commit()
CodePudding user response:
You should group by city and owner, filter the results in the HAVING
clause and then use SUM()
window function to get the count for each city:
SELECT DISTINCT city,
SUM(COUNT(*)) OVER (PARTITION By city) AS count
FROM businesses
GROUP BY city, owner
HAVING COUNT(*) >= 2
ORDER BY count DESC LIMIT 2;
There is a good video about SQL subqueries which I recommend watching:
https://www.youtube.com/watch?v=nJIEIzF7tDw
Please let me know if it worked or if the python code might be needed as well.