I am new to sql query and having a small problem , here i want to find "top 3 venues which hosted most number of eliminator matches" but I think my sql is wrong how to get this done? (I'm using jupiter notebook and pyspark)
import pandas as pd
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
from matplotlib import pyplot as plt
data2 = pd.read_csv('ipl_matches.csv')
data2
#Most number of eliminator matches
eliminator = data2[data2.eliminator == 'Y']
eliminator
##I'm having problem in this query##
q = """SELECT venue_id, eliminator
FROM eliminator """
names = pysqldf(q)
names
CodePudding user response:
See if this query works for you:
q=""" SELECT venue_id
FROM eliminator group by venue_id order by count(venue_id) desc LIMIT 3"""
CodePudding user response:
use this query :
q = """SELECT venue_id, eliminator FROM eliminator order by venue_id desc limit 3 """
this will give you last 3 biggest venue_id
in descending mode