I'm stuck on figuring this out. I have the following df I created from output. I'm looping through to get the table_name
, schema
, column_name
and comment
. I want to make sure there is a comment present, and if there is create a string query that appends to the list query_list
.
output = [['table_name', 'schema_name', 'column_name', 'data_type', 'null?', 'default', 'kind', 'expression', 'comment', 'database_name', 'autoincrement'], ['ACCOUNT', 'SFO', '_LOAD_DATETIME', '{"type":"TIMESTAMP_LTZ","precision":0,"scale":9,"nullable":true}', 'TRUE', '', 'COLUMN', '', 'DATE of Account', 'VE'], ['ACCOUNT', 'SFO', '_LOAD_FILENAME', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', 'file name', 'VE'], ['ACCOUNT', 'SFO', '_LOAD_FILE_TIMESTAMP', '{"type":"TIMESTAMP_NTZ","precision":0,"scale":9,"nullable":true}', 'TRUE', '', 'COLUMN', '', '', 'VE'], ['CUSTOMER', 'SFO', 'SUBSCRIPTIONSLIST', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', '', 'VE'], ['CUSTOMER', 'SFO', 'CONTACTROLESLIST', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', 'list of contract', 'VE'], ['DATA', 'SFO', 'OPPORTUNITY_NAME', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', '', 'VE']]
output = filter(bool, output)
df = pd.DataFrame(output)
df.columns = df.iloc[0]
df = df[1:]
query_list = []
grouped_comments = ''
for index, row in df.iterrows():
if row['comment'] is not None and row['comment'] != '':
if len(row['table_name']) > 1:
# the below doesn't work, it groups all table comments together
sql = f"(COLUMN {row['column_name']} COMMENT '{row['comment']}')"
grouped_comments = grouped_comments sql
elif len(row['table_name']) == 1:
sql = f"ALTER TABLE {row['schema_name']}.{row['table_name']} ALTER COLUMN {row['column_name']} COMMENT '{row['comment']}';"
query_list.append(sql)
Now the part I'm stuck on getting, is if there is a comment present and if the table_name
shows up more then once then it should create a string that looks like the below, so it should be fetching all the column_name
and comment
and grouping into one string for that table_name
:
"ALTER TABLE VE.ACCOUNT ALTER (COLUMN _LOAD_DATETIME COMMENT 'DATE of Account', COLUMN _LOAD_FILENAME COMMENT 'file name');"
and the elif works because if table_name only shows up once then it populates the correct string:
"ALTER TABLE VE.CUSTOMER ALTER COLUMN CONTACTROLESLIST COMMENT 'list of contract';"
so finally if I have the 2 above strings then my query_list should look like:
query_list = ["ALTER TABLE VE.ACCOUNT ALTER (COLUMN _LOAD_DATETIME COMMENT 'DATE of Account', COLUMN _LOAD_FILENAME COMMENT 'file name');",
"ALTER TABLE VE.CUSTOMER ALTER COLUMN CONTACTROLESLIST COMMENT 'list of contract';"]
CodePudding user response:
First, you can filter out unnecessary rows.
df = df[df.comment.notnull() & (df.comment.str.len() > 0)]
Then, concatenate the column statement.
df['column_prop'] = 'COLUMN ' df.column_name ' COMMENT \'' df.comment '\''
Now, aggregate the column_prop
to join the string, group by the table name.
df = df.groupby(['database_name', 'table_name']).agg({'column_prop': lambda x: ', '.join(x)}).reset_index()
This will give you the following.
0 database_name table_name column_prop
0 VE ACCOUNT COLUMN _LOAD_DATETIME COMMENT 'DATE of Account', COLUMN _LOAD_FILENAME COMMENT 'file name'
1 VE CUSTOMER COLUMN CONTACTROLESLIST COMMENT 'list of contract'
From here, you can concatenate the each columns with additional string to obtain your desired string.
If your dataframe is small, you can simply concatenate like this.
df['sql'] = 'ALTER TABLE ' df.database_name '.' ...
Or check here https://stackoverflow.com/a/54298586/2956135 for better string concatenation approach.