I have to make multiple sql queries of entire tables, and concatenate them into one big data table.
I have a dictionary where the key is a team name, and the value serves as an acronym where the acronym is the prefix to mySQL data tables
engine = create_engine('mysql mysqlconnector://%s:%s@%s/%s' % (mysql_user, mysql_password, mysql_host, mysql_dbname), echo=False, pool_recycle=1800)
mysql_conn = engine.connect()
team_dfs = []
nba_dict = {'New York Knicks': 'nyk',
'Boston Celtics': 'bos',
'Golden State Warriors': 'gsw',
'New York Knicks': 'nyk'}
for name, abbr in nba_dict.items()
query = f'''
SELECT *
from {abbr}_record
'''
df = pd.read_sql_query(query, mysql_conn)
df['team_name'] = name
team_dfs.append(df)
team_dfs = pd.concat(team_dfs)
Is there a better way to refactor this code and make it more efficient?
CodePudding user response:
If your nba_dict
is fixed, you can use UNION
to manually combine the result table of SQL.
abbr = list(nba_dict.values())
query = f'''
SELECT *
from {abbr[0]}_record UNION
SELECT *
from {abbr[1]}_record UNION
SELECT *
from {abbr[2]}_record UNION
SELECT *
from {abbr[3]}_record UNION
'''
df = pd.read_sql_query(query, mysql_conn)
df['team_name'] = list(nba_dict.keys())
CodePudding user response:
Your database layout, with a separate table for each team, is doomed to inefficiency whenever you need to retrieve data for more than one team at a time. You would be much much better off putting all that data in one table, giving the table a column mentioning the team associated with each row.
Why inefficient? More tables: more work. And, more queries: more work.
I suggest you push back, hard, on the designer of this database table structure. Its design is, bluntly, wrong.
If you must live with this structure, I suggest you create the following view. It will fake the single-table approach and give you your "gold layer". You get away with this because pro sports franchises don't come and go that often. You do this just once in your database.
CREATE OR REPLACE VIEW teams_record AS
SELECT 'nyk' team, * FROM nyk_record
UNION ALL
SELECT 'bos' team, * FROM bos_record
UNION ALL
SELECT 'gsw' team, * FROM gsw_record
UNION ALL .... the other teams
Then you can do SELECT * FROM teams_record ORDER BY team
to get all your data.