Currently my database looks like this:
import pandas as pd
output = [['Owner', 'Database', 'Schema', 'Table', 'Column', 'Comment', 'Status'], ['', 'VICE_DEV', 'AIRFLOW', 'TASK_INSTANCE', '_LOAD_DATETIME', 'Load datetime'], ['', 'VICE_DEV', 'AIRFLOW', 'TEST', '_LOAD_FILENAME', 'load file name', 'ADDED']]
df = pd.DataFrame(output[1:], columns=output[0])
It looks like this:
Owner Database Schema ... Column Comment Status
0 VICE_DEV AIRFLOW ... _LOAD_DATETIME Load datetime None
1 VICE_DEV AIRFLOW ... _LOAD_FILENAME load file name ADDED
I'm now trying to loop through the Status col, and if Status is None or empty I want to create a list of SQL string queries, and go to the next row and check, etc.
I've tried this:
query_list = []
for column in df['Status']:
if column is None:
sql = f"ALTER TABLE {df['Table']} ALTER {df['Column']} COMMENT {df['Comment']}; "
# print(sql)
query_list.append(sql)
But its not correct, its getting both rows, instead I want to only get the rows with the column df['Status'] None (or empty). Also, the formatting is off. I've tried by creating an empty list and an empty df.
CodePudding user response:
If I understand it correctly you want to create one query if Status
is None
. This query uses the values from the particular row that is None
. The problem then is that you insert whole columns into your query and not the values from the particular row with Status==None
. I think what you want is to iterate over all columns in your dataframe and then check if the status is None
. If so, create the query based on that row. This can be done with the following code for example:
import pandas as pd
output = [['Owner', 'Database', 'Schema', 'Table', 'Column', 'Comment', 'Status'], ['', 'VICE_DEV', 'AIRFLOW', 'TASK_INSTANCE', '_LOAD_DATETIME', 'Load datetime'], ['', 'VICE_DEV', 'AIRFLOW', 'TEST', '_LOAD_FILENAME', 'load file name', 'ADDED']]
df = pd.DataFrame(output[1:], columns=output[0])
query_list = []
for index, row in df.iterrows():
if row['Status'] is None:
sql = f"ALTER TABLE {row['Table']} ALTER {row['Column']} COMMENT {row['Comment']}; "
# print(sql)
query_list.append(sql)
query_list
Output:
['ALTER TABLE TASK_INSTANCE ALTER _LOAD_DATETIME COMMENT Load datetime; ']