Home > OS >  Pandas - loop through dataframe, check column values if empty, if empty then add entire row to new l
Pandas - loop through dataframe, check column values if empty, if empty then add entire row to new l

Time:02-11

Currently my database looks like this:

import pandas as pd

output =  [['Owner', 'Database', 'Schema', 'Table', 'Column', 'Comment', 'Status'], ['', 'VICE_DEV', 'AIRFLOW', 'TASK_INSTANCE', '_LOAD_DATETIME', 'Load datetime'], ['', 'VICE_DEV', 'AIRFLOW', 'TEST', '_LOAD_FILENAME', 'load file name', 'ADDED']]

df = pd.DataFrame(output[1:], columns=output[0])

It looks like this:

  Owner  Database   Schema  ...          Column         Comment Status
0        VICE_DEV  AIRFLOW  ...  _LOAD_DATETIME   Load datetime   None
1        VICE_DEV  AIRFLOW  ...  _LOAD_FILENAME  load file name  ADDED

I'm now trying to loop through the Status col, and if Status is None or empty I want to create a list of SQL string queries, and go to the next row and check, etc.

I've tried this:

query_list = []

for column in df['Status']:
    if column is None:
        sql = f"ALTER TABLE {df['Table']} ALTER {df['Column']} COMMENT {df['Comment']}; "
        # print(sql)
        query_list.append(sql)

But its not correct, its getting both rows, instead I want to only get the rows with the column df['Status'] None (or empty). Also, the formatting is off. I've tried by creating an empty list and an empty df.

CodePudding user response:

If I understand it correctly you want to create one query if Status is None. This query uses the values from the particular row that is None. The problem then is that you insert whole columns into your query and not the values from the particular row with Status==None. I think what you want is to iterate over all columns in your dataframe and then check if the status is None. If so, create the query based on that row. This can be done with the following code for example:

import pandas as pd

output =  [['Owner', 'Database', 'Schema', 'Table', 'Column', 'Comment', 'Status'], ['', 'VICE_DEV', 'AIRFLOW', 'TASK_INSTANCE', '_LOAD_DATETIME', 'Load datetime'], ['', 'VICE_DEV', 'AIRFLOW', 'TEST', '_LOAD_FILENAME', 'load file name', 'ADDED']]

df = pd.DataFrame(output[1:], columns=output[0])

query_list = []

for index, row in df.iterrows():
    if row['Status'] is None:
        sql = f"ALTER TABLE {row['Table']} ALTER {row['Column']} COMMENT {row['Comment']}; "
        
        # print(sql)
        query_list.append(sql)
query_list

Output:

['ALTER TABLE TASK_INSTANCE ALTER _LOAD_DATETIME COMMENT Load datetime; ']
  • Related