I have the following response from reading a gsheet from google API.
response = [['Owner', 'Database', 'Schema', 'Table', 'Column', 'Comment', 'Status'], ['', 'VICE_DEV', 'AIRFLOW', 'TASK_INSTANCE', '_LOAD_DATETIME', 'Load datetime'], ['', 'VICE_DEV', 'AIRFLOW', 'TEST', '_LOAD_FILENAME', 'load file name', 'ADDED']]
and created a df from the response:
df = pd.DataFrame(response)
Index 0 1 2 3 4 5 6
0 Owner Database Schema ... Column Comment Status
1 VICE_DEV AIRFLOW ... _LOAD_DATETIME Load datetime None
2 VICE_DEV AIRFLOW ... _LOAD_FILENAME load file name ADDED
How do I get the 0 row to become the column names instead?
I tried this:
df.columns = df.iloc[0]
and it works in a way, but then I still see the 0 row as the column names (this is not correct)
Index Owner Database Schema ... Column Comment Status
0 Owner Database Schema ... Column Comment Status
1 VICE_DEV AIRFLOW ... _LOAD_DATETIME Load datetime None
2 VICE_DEV AIRFLOW ... _LOAD_FILENAME load file name ADDED
CodePudding user response:
You can for example directly set the first row as columns and use the rest as rows:
response = [['Owner', 'Database', 'Schema', 'Table', 'Column', 'Comment', 'Status'], ['', 'VICE_DEV', 'AIRFLOW', 'TASK_INSTANCE', '_LOAD_DATETIME', 'Load datetime'], ['', 'VICE_DEV', 'AIRFLOW', 'TEST', '_LOAD_FILENAME', 'load file name', 'ADDED']]
df = pd.DataFrame(response[1:], columns=response[0])
df
Output:
Owner Database Schema Table Column Comment Status
0 VICE_DEV AIRFLOW TASK_INSTANCE _LOAD_DATETIME Load datetime None
1 VICE_DEV AIRFLOW TEST _LOAD_FILENAME load file name ADDED
You can also add one line to the code that you provided and achieve the same result:
df = pd.DataFrame(response)
df.columns = df.iloc[0]
# Remove first row
df = df[1:]
CodePudding user response:
Use loc accessor to slice the rows you need. This shouls result into a dataframe. Create a dictionary where old df's column names are the keys and the first row's attributes are values. Use the dict in the column rename method. Code below
s= df.loc[1:,:].rename(columns=dict(zip(df.columns, df.loc[0,:].to_list())))
Owner Database Schema Table Column Comment \
1 VICE_DEV AIRFLOW TASK_INSTANCE _LOAD_DATETIME Load datetime
2 VICE_DEV AIRFLOW TEST _LOAD_FILENAME load file name
Status
1 None
2 ADDED