Home > OS >  Change pandas dataframe first row to become column names
Change pandas dataframe first row to become column names

Time:02-11

I have the following response from reading a gsheet from google API.

response =  [['Owner', 'Database', 'Schema', 'Table', 'Column', 'Comment', 'Status'], ['', 'VICE_DEV', 'AIRFLOW', 'TASK_INSTANCE', '_LOAD_DATETIME', 'Load datetime'], ['', 'VICE_DEV', 'AIRFLOW', 'TEST', '_LOAD_FILENAME', 'load file name', 'ADDED']]

and created a df from the response:

df = pd.DataFrame(response)

Index 0       1       2      3             4              5         6
0  Owner  Database   Schema  ...          Column         Comment  Status
1         VICE_DEV  AIRFLOW  ...  _LOAD_DATETIME   Load datetime    None
2         VICE_DEV  AIRFLOW  ...  _LOAD_FILENAME  load file name   ADDED

How do I get the 0 row to become the column names instead?

I tried this:

df.columns = df.iloc[0]

and it works in a way, but then I still see the 0 row as the column names (this is not correct)

Index  Owner  Database   Schema  ...          Column         Comment  Status
0  Owner  Database   Schema  ...          Column         Comment  Status
1         VICE_DEV  AIRFLOW  ...  _LOAD_DATETIME   Load datetime    None
2         VICE_DEV  AIRFLOW  ...  _LOAD_FILENAME  load file name   ADDED

CodePudding user response:

You can for example directly set the first row as columns and use the rest as rows:

response =  [['Owner', 'Database', 'Schema', 'Table', 'Column', 'Comment', 'Status'], ['', 'VICE_DEV', 'AIRFLOW', 'TASK_INSTANCE', '_LOAD_DATETIME', 'Load datetime'], ['', 'VICE_DEV', 'AIRFLOW', 'TEST', '_LOAD_FILENAME', 'load file name', 'ADDED']]

df = pd.DataFrame(response[1:], columns=response[0])
df

Output:

Owner   Database    Schema  Table   Column  Comment Status
0       VICE_DEV    AIRFLOW TASK_INSTANCE   _LOAD_DATETIME  Load datetime   None
1       VICE_DEV    AIRFLOW TEST    _LOAD_FILENAME  load file name  ADDED

You can also add one line to the code that you provided and achieve the same result:

df = pd.DataFrame(response)
df.columns = df.iloc[0]

# Remove first row
df = df[1:]

CodePudding user response:

Use loc accessor to slice the rows you need. This shouls result into a dataframe. Create a dictionary where old df's column names are the keys and the first row's attributes are values. Use the dict in the column rename method. Code below

s= df.loc[1:,:].rename(columns=dict(zip(df.columns, df.loc[0,:].to_list())))


Owner  Database   Schema          Table          Column         Comment  \
1        VICE_DEV  AIRFLOW  TASK_INSTANCE  _LOAD_DATETIME   Load datetime   
2        VICE_DEV  AIRFLOW           TEST  _LOAD_FILENAME  load file name   

  Status  
1   None  
2  ADDED  
  • Related